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ABSTRACT 


The test results that are obtained from a test-retest 
design, whether or not the same sample was employed in the 
two test administrations, are often compared in order to 
determine an achievement difference score. But, can the 
researcher be assured that the same latent trait was being 
measured each time that the test was administered? 

The purpose of this thesis was to demonstrate an 
application of latent trait theory which enables the 
researcher to assess whether or not those tests that are 
employed in a test-retest design are measuring the same 
latent trait each time that the test was administered. The 
methodology was devised from the premise that a test item 
should assess achievement in a similar manner on each test 
administration, across all levels of student ability. This 
assessment was made by computing a probable item difference 
score between the two groups of examinees for each test item 
by first computing the equated item characteristic curves 
and then multiplying the differences (between groups) of the 
probability of success for a given student ability level by 
the proportion of total scores that were found at that 
ability level; then summing across all ability levels. Those 
items that were found to have a probable item difference 
score of greater than 0.05 were considered to be aberrant 
items that should not be included in the final mean 


achievement difference score; i.e., the examinees having the 
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same ability level in each study group that the test was 
administered to, did not have the same probability of 
correctly answering the aberrant item. 

Invorder (TG si ustrate this application oF latent  traqt 
theory, a portion of the data that were collected for the 
Edmonton Grade III Achievement Study (Clarke, Nyberg, & 
Worth, 1977) was employed. This study administered a battery 
of tests to the Edmonton grade II1 public school students in 
1956 and again to the Edmonton grade III public school 
students in 1977 in order to determine achievement 
differences across 21 years. The results indicate that there 
were items within this test battery that were not measuring 
the same latent trait each time that they were administered. 
When the data from the aberrant items were removed, the 1977 
students appeared to have higher abilities, as measured by 


the selected tests, than did the 1956 students. 
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[. INTRODUCTION 


There are many situations that arise in education, industry, 
and general research that necessitate the comparison and 
evaluation of two or more test scores obtained at different 
times. Tyler (1942) points out that presumably 

each of these scores or verbal summaries could be 

compared with scores or verbal summaries previously 

obtained; by this comparison some estimate of the 

degree of change or growth of students could be 

made. 
While such comparisons can be made, a question that arises 
is whether or not the evaluator can be assured that the 
test(s) are measuring the same attribute or trait in each 
testing situation. Common sense would dictate that if two or 
more measurements are to be compared, then it is imperative 
that the test(s) measure the same underlying construct or 
trait, and that the results be reported in the same metric. 
However, history has demonstrated that this intuitive 
approach is easier to state than substantiate (e.g., 
Lumsden, 1976). This may be accounted for by the very nature 
of the indirect measurements required of psychological 
researchers. Traditionally, the physical scientist had few 
problems when making comparisons between test results of the 
same metric because he obtained his test results via direct 
measurement; e.g., the measurement of the physical size of 


an object. Thus, the psychometrician is confronted with the 


uncertainties inherent in the use of psychological 
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measurements. These measurements are not only indirect, but 
they are often based upon deductions from hypothetical 
constructs of what might constitute a particular human trait 
or measurable attribute. For example, because intelligence 
is not a tangible substance, a psychologist measures it 
indirectly; via an instrument that is based on a belief 
about what intelligence is as an hypothetical construct; and 
how it is manifested in some observable fashion. 

Educational or psychological testing, then, usually 
involves the translation of some unobservable human 
characteristics -definedeas traitspuconstructs, “or 
abilities.' Because these traits are unobservable, they are 
often referred to as being latent. A person’s score on an 
achievement test can then be described by his/her standing 
on a particular latent trait (e.g., mathematical ability, 
verbal ability, or motivational level). 

In simple physical measurement, we believe that a ruler 
measures length with the same validity regardless of whether 
or not a desk or a field is being measured. In educational 
measurement, even if we are convinced that the same trait is 
being measured each time a given test is administered, we do 
not necessarily Know that the obtained measure will be 
invariant over samples; that is, it might be measuring 
achievement in different ways for different samples. 

In education, researchers may want to assess whether or 


not students of today Know as much about some subject as 


'These terms are usually used interchangeably because there 
are no universally accepted definitions. 
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other students before them Knew (e.g., Muir, 1961; Hedges, 
1977). If 20 minutes, 20 days, or 20 years have passed since 
the original students were measured on some ability, can the 
researcher be assured that the same test is measuring the 
same latent trait(s) that was measured before? Or, put 
another way, is the latent trait being measured in the same 
way? 

If the same underlying trait is being assessed each 
time the test is administered, then one would hope that the 
items composing the test would contribute the same type of 
information to the overall test score. Thus, if an item is 
assessing achievement in different ways on different 
administrations, then the test may be favoring or biasing 
the test results on the basis of which sample was 
administered the test. As the total score is a function of 
the individual item scores; the total amount of test bias 
that exists is a function of the bias contributed by each 
item. If the test results obtained for two or more samples 
are going to be compared, then a test item should not be 
biased toward any sample on any one specfic test 
administration. 

Classical test theory and latent trait theory have both 
been used to establish methods of determining item bias 
(emgi. Lorde Novick. 1968° "lord. (19527 1976al, Glassical 
test theory is based upon a model that advocates that every 
"observed" score is composed of a "true" score component and 


ane e6Pror. | score component (0y<=ai + E)s(thurstone, 1961; 
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Gulliksen, 1950). While this model (0 = T + E) makes 
intuitive sense, there are some problems that must be 
contended with; such as, the true score is not Known and can 
not ever be Known; and there is no assumption made about the 
frequency distribution of true scores in the sample to which 
it is applied. Thus, the measures used to assess item bias 
are dependent upon the group performance from which they 
were derived. Or, in the words of Hambleton, Swaminathan, 
Cooks Eignon, andiGutrond (1977) , 

We create a situation of bias and then try to use 

the mechanism that created the problem in the first 

place to investigate it [item bias]. 

Two measures are used to determine if a test item is 
performing in a similar manner each time that it is 
administered. The first measure is item difficulty which is 
defined as the porportion of examinees who answered an item 
correctly. The second measure is item discrimination which 
is a measure of how well a test item discriminates between a 
person who does well on the overall test versus a person who 
does poorly on the overall test. Item bias across samples is 
thought to exist if there is a significant difference 
between the measures of item difficulty or item 
discrimination as they are calculated from one sample of 
examinees and those same measures calculated from a second 
sample. 

In order to illustrate the problems that arise with the 
classical test theory approach to item bias, let us consider 


the concept of item difficulty as it is applied to the 
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following example. If the same test item of medium 
difficulty were administered to two groups of examinees, one 
of which was composed of high ability examinees and one of 
which was composed of low ability examinees, then the test 
item would be shown to be difficult in the case of the low 
ability group and easy in the case of the high ability 
group. Hence, the classical measure of item difficulty 
describes the difficulty of the item in relation to the 
group of examinees being tested; i.e., the classical measure 
of item difficulty is not independent of the frequency 
distribution of scores within the sample. A parallel 
argument may be presented with regard to item 
discrimination. Because item bias is confounded with sample 
differences, the traditional methods by which item 
difficulty and item discrimination are computed, may be 
inappropriate in order to assess item bias accurately. 

An alternative to the classical test theory approach to 
the measurement of item bias is provided by latent trait 
theory (Lord, 1952, 1953; Lord & Novick, 1968). As latent 
trait theory works from an assumption of sample invariance, 
the item parameters (discrimination and difficulty) are 
independent of group performance. Thus, test bias or item 
bias may be assessed on the assumption that those subjects 
in different samples who have the same ability level! have 
the same probability of getting a test item correct. This 
sample invariant property will then allow an investigator to 


assess test or item bias more accurately than the more 
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traditional methods. 
Purpose 

The purpose of this thesis was to demonstrate an 
application of latent trait theory which assessed whether or 
not selected tests, that were employed in experimental 
designs requiring the evaluation of test-retest data or the 
testing of two or more different groups of examinees with 
the same test, were measuring the same latent trait(s) each 
time that they were employed. This, in turn, allowed the 
assessment of the validity of making a comparison between 
the different test results. The basis on which the 
comparison of the data between groups was made was derived 
from the premise that those test items that were not 
appearing to be measuring the same underlying trait(s) in 
both test administrations were not to be included in the 
final comparisons. 

In order to illustrate this application of latent trait 
theory, a portion of the data that were collected for the 
Edmonton Grade II] Achievement Study (Clarke, Nyberg, & 
Worth, 1977) was employed in this thesis. The Edmonton Grade 
II] Achievement Study incorporated a battery of tests that 
was administered to all the Grade III students in the 
Edmonton Public School system in 1956. An almost identical 
battery of tests was administered to all the Grade III] 
students in the Edmonton Public School system in 1977. Thus 
we have a situation where the same tests were administered 


to two groups of examinees, 21 years apart. A co-purpose, 
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then, was to determine, through latent trait theory, whether 

or not the results of the tests of these two different test 

administrations could be directly compared with each other 
in order to assess achievement differences, between the two 
groups when they were tested 21 years apart. Furthermore, 
since some of the test items employed in the Edmonton Grade 

III Achievement Study were found to be favoring one or the 

other group of students, then the achievement differences 

between the two study years were calculated when these data 
were removed prior to making the comparisons. 

In addition, Clarke et al. (1977) reported the item 
Statistics (item difficulty and item discrimination) of the 
items used in the Edmonton Grade III Achievement Study that 
were calculated by classical test theory ae eeda cay: 
Although the item statistics that were calculated by 
classical test methodology were not directly comparable to 
the item statistics derived from latent trait theory, these 
results did allow a comparison of the implications of the 
results of the two different approaches to be made. 

In summary, then, this thesis: 

Ws Demonstrated an application of latent trait theory for 
determining item bias in tests that are administered on 
different occasions. This, in turn, allowed for the 
determination of the validity of comparing data 
obtained 21 years apart in order to assess achievement 
differences between the different groups of examinees. 


Oe Determined the ability differences between the 1956 and 
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ine 1977 Edmonton Public School Board grade Dil 
students, as measured by the application of latent 
trait theory, on a portion of the data obtained from 
the Edmonton Grade II1I1 Achievement Study. 
ce Provided a discussion of the differences of the 
implications of the test statistics that were computed 
from these data via classical test theory methodology 
and latent trait theory methodology. 
The following chapter provides a review of related 
literature as it pertains to (a) latent trait theory and (b) 


the Edmonton Grade III Achievement Study. 
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Lene VIEW OF ORELATED LITERATURE 


A. Latent Trait Theory 

A general theory of latent traits presupposes that an 
individual’s behavior can be predicted or explained by 
defining human characteristics, called traits, and that a 
person’s performance on any given trait is based on the 
ability s/he possesses on that underlying trait (Lord & 
Novick, 1968). As these traits are measured indirectly, they 
are referred to as being latent. Hence, the observed 
examinee’s test result and the underlying trait or ability 
should be related. 

In latent trait theory, the relationship between item 
performance and ability is expressed by the item 
chatactecristicscurve,; a ploltsof the propability of rgetting 
the item correct as a function of the ability level 
expressed in ability units. 

An important property of latent trait theory is that 
estimates of item characteristic curves do not depend on the 
invariant distributions of a given ability across different 
groups of examinees (Birnbaum, 1968). In fact, it need not 
be necessary to use the identical test forms in order to 
make valid comparisons across different groups of examinees, 
as long as the tests measure the same traits. These 
properties should allow researchers to assess changes of 
construct validity across time for different groups of 


examinees. In deriving the working mathematical model for 
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latent trait analysis, Birnbaum (1968) outlines three 
fundamental notions that apply in general to latent trait 
theory: The dimensionality of the latent space, local 
independence, and item characteristic curves. 
Dimensionality 

Any examinee’s test score can be expressed as being 
represented by k latent traits. If the examinee’s test score 
is determined by a single ability, then the trait is 
unidimensional. If the examinee’s test score is represented 
by more than one ability, then the test is said to be 
multidimensional. These kK traits can be said to form ak 
dimensional space in which the examinee can be positioned. 
Most tests attempt to measure a single ability or trait. 
Consequently, many latent trait models make the assumption 
of unidimensionality of a test. Factor analysis is usually 
advocated to assess the dimensionality of a given test 
(e.g., Hambleton & Traub, 1973; Lumsden, 1976) in order to 
ascertain whether a unidimensional or multidimensional 
latent trait model would be more appropriate for a given 
test. However, Rentz and Rentz (1979) suggest that 

factor analysis does not clarify the dimensionality 

because factor analysis is itself a model (or a 

number of models) with several, sometimes 

conflicting, concepts of dimensionality. 
The problem is further compounded when a reseacher is 
interested in the performance of a test administrated on two 
separate occasions; i.e., if there is a difference in the 
composition of the samples employed, e.g., if one sample was 


heterogeneous and another sample was homogeneous, then the 
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resulting dimensionalities as measured by factor analysis 
may be different due to sample configuration and not due to 
the way in which the items were measuring the underlying 
trait. Thus, there may be no clear definition of 
undimensionality beyond the mathematical definition (Rentz & 
Rentz, 1979). If the test were measuring the same underlying 
trait each time that it was administered, then it would be 
assumed that the test dimensionality would be the same. 
Local Independence 

Local independence means that the item scores are 
related to each other only through the latent variables that 
are being measured; i.e., 

within any group of examinees all characterized by 

the same values 8 , @, ... 6 , the (conditional) 

distributions of the item scores are all 

independent of each other. (Lord & Novick, 1968) 
This implies that for a fixed ability level, the joint 
probability of the performance of item i and item j equals 
Pi x Pj when these items are locally independent; e.g., if 
an examinee had a probability of .7 of answering item i 
correctly and a probability of .8 of answering item j 
correctly, then the examinee has a probability of (.7) x 
(.8) = .56 of passing both items. Similarly, if one examinee 
had a probability of .4 of answering item i correctly and 
another examinee of the same ability had a probability of .5 
of answering item i correctly, then the probability of of 
both examinees answering item i correctly will be (.4) x 
(> ee-eeee OU eli sea | SOminp li ecomina ten ormamha xecsdoipliiky 


level, the probability of an examinee answering item j 
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were not the case, then more than one trait would be needed 
in order to explain differences that might arise across 
different examinees of equal ability. In essence this can 
only be achieved if all of the test items measure a single 
trait. Hence, the techniques used to measure 
unidimensionality are also appropriate to establish whether 
Onrnct the test items are locally independent, but they are 
also subject to the same problems. 
Item Characteristic Curve 

As noted earlier, the mathematical function that 
relates the probability of achievement on a test item to the 
ability measured by the test on the whole, is Known as the 
item characteristic curve. This relationship takes the form 
Of a nonlinear regression function of the item score on the 
latent trait represented by the test (Hambleton & Cook, 
1977). An important property of the item characteristic 
curve is that the distribution of ability within a given 
group of examinees will not affect the shape of the item 
characteristic curve. Thus, if the item characteristic curve 
for two groups of examinees differed when being assessed on 
the same underlying trait, then the differences could not be 
explained in terms of differences in ability; but rather 
that the item is not measuring the same trait in the same 
way in both cases. 

Hence, the item characteristic curve is the prime 


instrument that can be used to establish whether or not an 
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item is performing in the same manner each time that it is 
administered. When a researcher compares the means of each 
sample that s/he obtained on a given test administration, 
then s/he is assuming that each test item is performing in 
the same manner on each of those test administrations and 
that the mean difference between those samples reflects a 
true difference of achievement. While this assumption is 
rarely tested, it would appear that, as the sample varies in 
composition and/or as the inter-test interval increases, 
then the probability increases that the test items may not 
be performing in the same manner on both test 
administrations. Consequently, the mean achievement 
differences between samples may not reflect a true 
achievement difference, but rather an achievement difference 
confounded by differences in the performance characteristics 
of the instrument; i.e., item bias. It would appear that the 
best way to assess item bias that arises when a test is 
administered on two or more occasions lies within latent 
trait theory. 
Different Types of Latent [Trait Models 

Latent trait models differ from each other in the 
number of item parameters used to produce the item 
characteristic curve and with respect to the assumed 
fundamental relationship between ability and the probability 
of success. Lord (1952, 1953) developed a latent trait model 
where the item characteristic curve takes the form of the 


normal ogive. An examinee’s performance was related to 
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ability and to two item parameters: item difficulty and item 
discrimination. Birnbaum (1968) altered this model by using 
the logistic cumulative function instead of the normal-ogive 
function. Birnbaum also added a third item parameter (c) in 
order to account for the probability of guessing a correct 
response on a given item for an examinee of a given ability 
level. 

The one-parameter logistic model, Known as the Rasch 
model (Rasch, 1966) is recognized as a special case of 
Birnbaum’ s two-parameter logistic model. The Rasch item 
characteristic curves are calculated under the condition 
where all items are considered to have equal discriminating 
power and where guessing is not considered to be a factor; 
The sole parameter is a difficulty parameter? (Wright, 

1968). The Rasch model has some advantages over the two- or 
three-parameter models because it is easier to compute the 
examinee’s abilities and the computational problem of trying 
to estimate a larger number of different parameters is 
avoided. However, the degree of robustness of the Rasch 
model may play an important role in deciding the 
appropriateness of its application (Hambleton & Traub, 
Poel 

Samejima (1969, 1972, 1973) has introduced several 
different types of latent trait models to accommodate those 
tests that are not scored in the dichotomous fashion 


2 The Rasch model was developed independently of Birnbaum’ s 
model and Rasch is usually given credit for recognizing the 
usefulness of a one-parameter model. 
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required by the different latent trait models that have been 
discussed so far. The nominal response model introduced by 
Bock (1972) and Samejima (1972) can be used when a given 
test is scored multichotomously as opposed to dichotomously; 
j.e., there are a number of alternatives from which the 
examinees may make their selection. The nominal response 
model allows for the calculation of “item option 
characteristic curves" for each respective item option. If 
the item responses of a given test can be ordered; i.e., the 
alternatives can be ordered in terms of correctness, then 
the graded response model (Samejima, 1969) would be able to 
accommodate the different type of item responses by 
producing an “operating characteristic" based upon the 
two-parameter logistic model. A variation of the graded 
response model is the continuous response model (Samejima, 
1973). This model was introduced by Samejima in order to 
accommodate those tests that require the flexibility of 
allowing the examinee to respond on a continuum. This type 
of response often occurs with studies investigating 
affective traits. 

While these different latent trait models were 
developed to accommodate a particular data configuration, 


each model is based on the general form: 
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Thus, the probability of a correct response by an examinee 
with a given ability on item i, is described by the item 
parameters a (discrimination) and b (difficulty). The 
different models may incorporate a different number of 
parameters, but each model will define precisely what the 
probability of success will be on a given test item for a 
particular ability level if the item parameters are Known. 


For the three-parameter logistic model, the function is 
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For the two-parameter logistic model, the lower asymptote 
parameter (c) has a value of zero.3 For the one-parameter 
logistic model (also Known as the Rasch model) the 
discrimination (a) parameter is a constant. The problem is 
that these parameters are not Known, so they must be 
estimated. 


Methods of Obtaining the Item Characteristic Curve 


Parameters 

There does not appear to be any one method of obtaining 
the item characteristic curve parameters that can be used 
with all testing situations. Consequently, several methods 
have evolved by which the item parameters may be estimated. 
In selecting an estimation procedure, the researcher must 
decide on the number of item parameters which s/he wishes to 
work with and the method by which the test items are to be 
scored. If the test items discriminate equally and guessing 
does not appear to be a factor, then the Rasch model would 
be the most appropriate model to use. If the test items were 
scored dichotomously and did not discriminate equally, then 
the two-parameter logistic model would be appropriate. If 
guessing was also a factor, then the three-parameter 
logistic model would be an appropriate model to choose. If 
there were several alternative answers that the examinee 
could choose from, then the researcher could consider one of 
the nominal response models; i.e., the graded response model 


if the alternatives could be ordered for correctness, or the 


3This third parameter (c) is sometimes referred to as the 
guessing parameter. 
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continuous response model if the items are marked on a 
continuum. Having selected the model that the researcher is 
going to follow, then the examinee abilities and the item 
parameters must be estimated. 

The maximum likelihood method of estimating the item 
parameters has been the most common approach used to date. 
This method usually assumes a unidimensional latent space 
that has been defined by test items that have been scored 
dichotomously. “Sirnabaum 11968) "points out that (7 x mn) + {N 
- 2) likelihood equations need to be solved; (where i = the 
number of item parameters, n = the number of test items, and 
N = the number of examinees); in order to obtain the maximum 
likelihood estimates of the item parameters. The method 
suggested by Birnbaum (1968) to solve the equation set 
involves estimating initial values for the item parameters 
and the ability estimates in the likelihood equation set and 
adjusting them in small steps to fit the data. The 
estimation procedure is considered converged when the 
difference between the successive item parameter estimates 
becomes less than the errors of calculation, causing the 
criterion function to fluctuate (Wood, Wingersky, & Lord, 
1976). Different methods by which the initial item parameter 
estimates are made, the type of algorithm used to produce 
the item parameters, and the type of constraints that are 
placed on that algorithm in order to provide the best 
solution have been advocated by different researchers (e.g., 


Birnbaum, 19868 ;— Bock, 1972 Anderson, 19/0; Wright 
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Douglas, 1977). The issues that are being debated by these 
and other researchers arise from the desired objective of 
finding an efficient algorithm in relation to the final item 
parameters that are estimated. The iterative process is very 
time consuming to take to completion and is, therefore, 
costly. A summary of the different methods used to estimate 
the item parameters was presented by Hambleton et al. 

(1977). The iterative procedure needed to obtain a solution 
necessitates the use of a computer. 

The LOGIST computer program written by Wood, Wingersky, 
and Lord (1976), computes estimates of examinee ability and 
item charateristic curve parameters using an iterative 
process described as a modified Newton’s method. The test 
item parameters are estimated via the maximum 1ikelihood 
methods described by Lord (1968, 1974) which, in turn, are 
based upon Birnbaum's three-parameter logistic model 
(Birnbaum, 1968). If a researcher were interested in 
estimating the item parameters according to the Rasch model 
using a corrected unconditional maximum 1]ikelihood 
procedure, then BICAL, written by Wright and Mead (1976), 
would be an appropriate choice. One of the problems that a 
researcher experiences at this time is the lack of a wide 
distribution and/or selection of computer programs dealing 
with latent trait theory. However, the amount of information 
on latent trait theory is steadily growing (e.g., the summer 
of 1977 issue of the Journal of Educational Measurement 


dealt entirely with the subject of latent trait theory) and 
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the influence of latent trait theory is spreading within the 
research community. 
Applications of Latent Trait Theory 

The majority of the literature reported on the subject 
OF latent “trait theory 

has been intended for measurement theorists, 

without hintseom practical application, highly 

symbolic and, perhaps concise to the point of 

obfuscation. (Jaeger, 1977) 

However, the practical application of latent trait theory is 
emerging; primarily due to the availability of new computer 
programs (e.g., Wood, Wingersky, & Lord; 1976) and the high 
speed computers necessary to run them. Hence, the bridge 
between theory and application is being built and is forming 
ENesbdS ise Of mca e VON Oommen. | mtne mic | deOimeducat ional 
measurement" (Marco, 1977). This revolution is occurring in 
response to some of the problems that have been associated 
with classical test theory (see Lumsden, 1976). 

Thus far, latent trait theory has been demonstrated as 
being able to provide acceptable methodology in approaching 
many stastistical problems (see Lord, 1977). In summary, 
Lord (1977) reports that item characteristic theory 

provides us with the frequency distribution f of 

test score(s) for examinees having a specific level 

OfMabisiwtly on skills 
This, then expands the information that is normally obtained 


from classical test theory methodology. 
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if we have a pool of pretested items, all measuring 

the same trait or ability, we can predict the mean, 

variance, reliability, and raw score frequency 

distribution of any test constructed from these 

items once we Know the ability levels in the group 

to be tested. 

Individualized or tailored tests are made possible, then, by 
estimating the examinee’s ability and then providing a set 
of questions that will maximize the information that can be 
obtained.*Item banking then becomes viable when latent trait 
theory is applied to the selection of items that constitute 
any inaividual test WUrry, 91977). 

Latent trait theory also provides for the differential 
weighting of response alternatives. The nominal response 
model (Samejima, 1972; Bock, 1972) has been applied to those 
tests where the options to each item can be weighted with 
regard to degree of correctness (e.g., Thissen, 1976). 

Another application of latent trait theory emerged when 
Lord (1973) estimated the power scores for a group of 
examinees who had taken a mistimed verbal aptitude test. 
This application of latent trait theory was made possible 
because the examinees had answered sufficient items to allow 
the estimation of the examinee’s ability. Once this had been 
accomplished, the probability of the examinee getting the 
remaining items correct was calculated. An estimate of the 
examinee’ s power score was then obtained by summing the 


probable scores on the unanswered items and the examinee’ s 


obtained score. This procedure was possible only because 


4 This is especially true of the information obtained at the 
extremes of the ability distribution (Hambleton et al., 
1977) 
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latent trait theory provides the methodology to determine 
the probability of an’examinee getting a correct score for a 
given ability level. 

In addition, latent trait theory has been applied to 
test data in order to determine the equivalence of different 
forms of tests measuring the same trait. Lord (1975) has 
reported a summary of the different methods by which this 
can be achieved. However, the basic procedure involves the 
equation of the student ability estimates across the 
different tests that are pruported to be measuring the same 
abl laeby.; 

Latent trait theory has also been applied to the 
measurement of item bias. While the concept of test bias has 
been an underlying concern of past researchers, the general 
social concern about such issues is also rising in 
connection with the testing and the classification of 
minority groups or the competency evaluation of students 
(es gise Ando tteano 75 )-, 

In latent trait theory, if the test items are measuring 
the same latent trait each time a given test is administered 
then the item parameters will be linearly related across the 
different groups of examinees. This approach has been used 
with both the three-parameter logistic model (e.g., Lord, 
1976a) and the Rasch model (e.g., Wright, Mead, & Draba, 

LOS Soe 
Wright, Mead, and Draba (1976) report having applied 


the Rasch model to compute goodness of fit residuals for 
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between-group differences in order to assess item bias. In 
order for this approach to be made feasible, however, the 
data should meet the requirements of the Rasch model. As 
mentioned earlier, however, some researchers report that the 
robustness of this model has not yet been fully tested 
(e.g., Hambleton & Traub, 1971). 

Lord (1976a) reported the beginning stages of the 
development of an asymptotic significance test based on the 
summed variance - covariance matrices of the difficulty (b) 
and the discrimination (a) parameters. But, as was reported 
By Glondeys97 6a} 

it is not presently possible to specify with 

certainty . . . the asymptotic standard error of 

the maximum likelihood estimates used. 

A third approach to measuring item bias has been 
reported by Rudner (1977). In this study, he computed the 
area of the differences between the equated item 
characteristic curves. This method appears to be the most 
feasible approach of measuring item bias by latent trait 
methodology, to date, but the range of abilities (-5.00 to 
+5.00) from which he calculates the area of difference 
between the two item characteristic curves may be extreme: 
i.e., he does not account for the difference between the 
number of examinees found at the extreme ability levels 
(which will be low) and the number of examinees found at the 
medium ability levels (which will be high). 

While researchers are beginning to apply latent trait 


analysis procedures to some research projects, there are 
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many areas that are open to investigation. For example, 
nowhere in the literature has any researcher reported the 
investigation of item bias, using latent trait methodology, 
of a test that was administrated to two groups of examinees, 
21 years apart, in order to assess achievement differences 


between the two groups. 


B. Edmonton Grade III Achievement Data 

The Edmonton Grade III Achievement Study was conducted 
for the Alberta Advisory Committee on Educational Studies by 
S.C. leGlarke  eVesNybergs cand Wale Worthi=° “In conjunction 
with a team of University of Alberta students, the author 
was responsible for the data collection, test marking, data 
coding, and data analysis for the Edmonton Grade III 
Achievement Study. 

Due to the complexity and relatively large size of this 
project (involving over 8,000 subjects and 285 test items), 
the data collection and processing were accomplished only 
through the co-operation of a large number of students, 
teachers, principals, and their respective schools; as well 
as the Edmonton Public School Board, the Department of 
Education for the province of Alberta, the Faculty of 
Education of the Univerity of Alberta, and the original 
investigators of the 1956 study. ® 


5N.M. Purvis was also involved with the Edmonton Grade III 
Achievement Study until his death in March, 1977. 

6Many people were associated with the 1956 Edmonton Study of 
Achievement (e.g, G.M. Dunlop, $.C.T. Clarke, 

R.S. Mac Arthur, R.C. Harper, & W.H. Worth). The results of 
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The man-years of work that were expended in the acquisition 
of these data are large, but undoubtedly provide a wealth of 
information on the differences of student achievement after 
a period of 21 years. 

The following five tests made up a comprehensive test 
battery that was administered to the Grade III students of 
theredmoentom Publ ivceschool system ini 1956s and again in 1977: 
ie Raven Progressive Matrices (1947). Sets A, Ab, B; 

2% California Short Form Test of Mental Maturity 
(Sullivan, Clark, & Tiegs, 1953); 

Sh Gates Advanced Primary Reading Tests for Grade 2 
(second half) and Grade 3, Form 1 (Gates, 1942); 

4. Gates Advanced Primary Reading Tests, Paragraph 
Reading, for Grade 2 (second half) and Grade 3, Form I 
(Gates, 1942); 

SF California Achievement Tests Complete Battery, Primary 
Grades) 1), 25)3, and 4.) Forme€C Reading, Arithmetic. 
and Language (Tiegs & Clark, 1950). 

The Raven Progressive Matrices Test, Sets A, Ab, and B, 
contain aetotal of S641tems. Each item presents ‘a. figure 
with a section of the figure missing. The examinee must 
select the correct missing section from the six alternatives 
presented for each item. The test is untimed and is 
described by Raven (1965) as "a test of observational and 
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6(cont'd)this study were reported in various minutes of the 
Alberta Advisory Committee on Educational Research and the 
minutes of the Edmonton Public School Board (Clarke et al., 
ROMO 
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print as of 1979, and may be obtained commercially. 

This is the only test administered to the 1977 students 
that did not coincide exactly with that administered in 
1956. The 1956 students were administered the 1947 edition 
of this test and the 1977 students were administered the 
1962 edition. The differences between the two editions are 
slight and as the 1947 edition was not available in 1977, 
the decision was made to use the 1962 edition. The test 
items in both editions (1947; 1962) are identical, but in 
subtest A, the order of items 11 and 12 was interchanged. In 
addition, the order of some of the distractors was 
rearranged for a few of the items in the 1962 edition. The 
placement or order of the correct distractor was, however, 
not changed for any item. As this test was scored 
dichotomously, the exact order of the distractors did not 
play a part in the LOGIST analysis. 

The California Short Form Test of Mental Maturity is 
out-of-print, but the publishers gave permission to the 
investigators of the Edmonton Grade III Achievement Study to 
reprint the test. A copy of this exam is presented in 
Appendix A. The California Short Form Test of Mental 
Maturity assesses the general intellectual maturity of 
students using 938 questions that vary in composition and 
complexity. While time limits are imposed on the student, 
the test is designed as a power test and not as a speed 
test. The examiners administered the directions and 


questions for each test item as per the instructions in the 
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kestemantial® Aecopyrer etheidirections forvadministrationsof 
this test is presented in Appendix A. 

The Gates Advanced Reading Test: Word Recognition is 
out-of-print. The publishers granted permission for the 
investigators of the Edmonton Grade III Achievement Study to 
reproduce the test. A copy of this test is presented in 
Appendix A. The Gates Word Recognition Test was designed to 
assess a Student's ability to select a word that best 
describes an accompanying picture from among four 
alternatives for each of 48 items. The test items vary in 
difficulty, from easy to hard, and were constructed under 
the premise that, "The fewer [items] he [the student] can 
recognize without error, the less ready he is to do 
madeoendent reading" (Gates, 1943). 

The Gates Advanced Primary Reading Test: Paragraph 


Reading is out-of-print. The publishers granted permission 


to the investigators of the Edmonton Grade III Achievement 
Study to reproduce this test and a copy of the test is 
presented in Appendix A. The Gates Paragraph Reading Test 
was designed to assess a student’s ability to read and 
understand a statement directing the students to perform a 
particular task and to carry out that specified directive. 
Or, as described by Gates (1943), "This test measures 
ability to read thought units with full and exact 
understanding of the whole." This test is composed of 16 
one-directive questions and 8 two-directive questions. The 


tests are timed, but in general, the time allotted allows 
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all students to attempt all questions. Thus, this test was 
not designed to be a speed test. 


ue 
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The California Achievement Tests Complete Battery 


(Tiegs & Clark, 1950) is a comprehensive achievement test 
containing several sub-tests covering a wide range of 
abilities. While this test was administered to the 1956 
students and the 1977 students, these data were not included 
in the data analyzed in this thesis since the 
appropriateness of the application of latent trait theory 
could be adequately investigated by using the data obtained 
from the other four tests. 
Data Collection Procedures 

The testing took place about the end of May, both in 
1956 and 1977. The methods by which the data were collected 
in 1956 were again followed in 1977. The achievement tests 
were administered in the classroom by the teachers, and the 
mental maturity tests were administered by a team of 
University of Alberta students. The data from 1956 had been 
retained and were re-marked and coded for computer analysis 
at the same time that the 1977 data were processed. A team 
of University of Alberta students was responsible for the 
test marking and data coding. The data were then transferred 
to magnetic tape by the department of education, Alberta 
Education. A full description of the data collection was 
reported by Clarke, Nyberg, and Worth (1977, 1978). 

The purpose of the 1956 study was to assess the degree 


of achievement of the Edmonton Public School Grade III 
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students; while the etiology for the 1977 study was to 
assess the difference in achievement between the 1956 
Edmonton Public School Grade III students and the 1977 
Edmonton Public School Grade III students. There were 
approximately 3,500 students involved in the i956 study and 
4,500 students involved in the 1977 study. To cause the 
least inconvenience to the co-operating teacher, the battery 
of tests was administered over a number of days. No attempt 
was made to recover data that were lost as a result of 
student absenteeism. Consequently, the number of students, 


that wrote any one test, varied. 


C. Chapter Summary 

The advantages of latent trait theory methodology over 
classical test theory methodology has been advocated by many 
researchers (e.g., Lord & Novick, 1968), although the latent 
trait methodology has not yet been widely applied. The 
importance of latent trait methodology in determining item 
bias has also been demonstrated (e.g., Rudner, 1977), but a 
method of determining item bias through latent trait 
methodology which also accounts for the frequency 
differences among the examinee ability levels has not been 
demonstrated. In addition, the application of latent trait 
theory has not been applied to test data in order to 
determine the validity of making achievement score 
comparisons of tests that were administered many years 


apart. This thesis, then, attempts to demonstrate an 
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application of latent trait theory in order to determine 
item bias without having to ignore the frequency differences 
between the different examinee ability levels. 

In demonstrating this application of latent trait 
theory, the validity of comparing a portion of the 1956 data 
with the 1977 data from Edmonton Grade III Achievement Study 
was determined along with the resulting achievement 
differences between the two groups. In addition, it was then 
possible to discuss the implications of the latent trait 
theory item statistics with the implications of the 
classical test theory item statistics reported by Clarke, 
Nyberg and Worth (1977). The following chapter describes the 


method by which the data were analyzed. 


stat dead BAe Se nets 


apeeeged!: SAt a neh es, ies He? WY ian 


qaure tramayishctos Lit abet netnank 3 tet * a Yat "BY 
{nena LAQA al art} At ny. deals dom 

aac* gen xt Shon thbbe Ad os ZAERO, adh nsouied @ te 
siya: Jhetel ‘shi to Soe Ait aah) Brie OF eid pec 


art! etny@2yo il) eitete tren modi 


aires) Ye Serands" S59 PTs bo ose voces Pad Igote sic 
mek 


. 
ai) ead? ossa WDisslty eriwof iar aw) Ae?) Ato bona peal 


"heasviiensd =o 40. 2st aed act yd bond: ne 
ass 


ar? 7a Sno ritsarl am 


i, 


III. METHOD 


Data Analysis 

The data from the Edmonton Grade III Achievement Study 
were recoded to the input format required by LOGIST (Wood, 
Wingersky, & Lord, 1976). The University of Alberta’s 
Division of Educational Research Services’ scoring program 
(SCORO1) was employed to score the data, which were then 
subsequently reformatted as documented by Wood et al. (1976, 
Dali cach teste from cacmes tudy year (1956 of 1977) was 
then analyzed by LOGIST and the examinee abilities (THETAs) 
and the item characteristic parameters were estimated. 

The item parameters on a particular test administered 
in 1956 can not be directly compared to the item parameters 
on the same test administered in 1977. As Lord (1976a) 
Indicates, tnirs 

is inherent in the nature of the problem that the 

origin and the unit for measuring ability cannot be 

determined from the data. 

Before a comparison of the item characteristic curves 
can be made between the two study years the item 
characteristic curve parameters that were obtained from the 
1977 students must be rescaled to the scale obtained from 
the 1956 students. The only term in the item characteristic 
function (2) that is dependent on the examinee ability value 
(0) the difficulty (by /setand: discrimination (ae) atem 


parameters is a (68 - b ). The scale values of these 
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variables can be transformed linearly, but the 


transformation must be accomplished without changing the 


prediction 


of the latent trait model which, in this case, is 


represented by function (2) (Allen & Yen, 1979; pp 256): 


j.e., it 1s necessary that 


where: 


~denOtescethace tne value Nas) been rescaled. 


In order to comply with the constraint set forward by 


formula 3, 


the 1977 item characteristic curve parameters 


were rescaled to the 1956 item parameters in the following 


manner. The 1977 item parameters were rescaled by using 


items that 
considered 
Gi veulry 
than -4.00 
only those 
was within 


purpose of 


the logistic model fit well. An item was 

to bes Miler itthing son poorly defined af iitehad a 
(b) parameter that was greater than 4.00 or less 
on either the 1956 data or the 1977 data. Thus, 
items that had a difficulty (b) parameter that 
the range of plus or minus 4.00 were used for the 


rescaling the 1977 data. 


This procedure was necessary in order to prevent those 


items that were extremely easy or extremely hard from 
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over-influencing the rescaling procedure. If the items that 
had an extreme difficulty (b) parameter were not eliminated, 
then the rescaled parameter would not fit the logistic model 
and equation 3 would not be balanced. 

Another reason necessitating this procedure was that 
the item parameters were obtained after only 16 iterations 
of the LOGIST program and not necessarily at convergence. 
This decision was based on a cost-benefit estimate of 
allowing LOGIST further computer time. As the data from both 
the 1956 and 1977 students were analyzed in the same manner, 
and as those items with extreme difficulty (b) parameters 
had little to offer in the way of test information; there 
was little reduction of test information and a large saving 
in computing expenses. 

Those items that had a difficulty (b) parameter s -4.00 
or 2 4.00 made up the set of items that formed the basis of 
the standardization procedure. The mean and standard 
deviation of this set of items were calculated for the 
difficulty (b) parameter from the 1956 data. The item 
difficulty (b) parameters from the 1977 data were then 
rescaled to the scale obtained from the 1956 data by the 


following formula: 


be = 52 (Z 


where: 
* 
bs = the rescaled 1977 difficulty parameter 
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x =~ Che wmeane One thee >6eCibeten ley parameter; 


25 = the 1977 difficulty parameter Z score 
EOm Seen 1 > 
=sctherstandard deviatyvon of the 1956 


Cl er rocu ley pabaneteia., 


The discrimination (a) parameter must be rescaled in 
order to maintain the same amount of information that was 
obtained by the original discrimination (a) parameter in 
relation to the original difficulty (b) parameter. This may 
be visualized by noting that the discrimination (a) 
parameter controls the slope of the item characteristic 
curve and, if the difficulty (b) parameter is rescaled, then 
the slope of the curve is going to have to be adjusted 
acCcOraingly 1's eqlaliones 1s Going to remain true, 

The 1977 data item discrimination (a) parameters were 


rescaled by the following formula: 
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If the 1977 data item parameters are rescaled then the 
1977 student abilities must also be rescaled. Given that the 
item difficulty (b) and the ability scale are the same 


scale, then the student abilities must be rescaled according 


to the following formula: 


65. = eee shescaleq 077, abmilvey level: 


05. oe NOV OREGinailg O77 sab imettya le youn, 


The rescaled value of ability will allow the determination 
of the distribution of the rescaled 1977 student abilities 
in relation to the 1956 student abilities. 

The lower asymptotic (c}) item parameter of the 1977 
data was reset to the respective 1956 data levels. This was 


necessary because the logistic model indicates that if the 
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difficulty (b) parameter and the discrimination (a) 
parameter are rescaled then they must adopt the lower 
asymptote of the new scale (Marco, 1977). 

The following proof is offered as substantiation that 
equation 3 is true when the rescaling formulae 4, 5, and 6 
are applied to the rescaling of the 1977 item parameters to 


the scale obtained from the 1956 data. 
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Using the rescaled parameters, the probabilities of 
success for each ability in the range of plus or minus 3.00 
in steps of 0.2 were calculated from equation 2. The item 
characteristic curves were then plotted. 

A probable difference of item scores was calculated by 
multiplying the differences (between 1956 & 1977) of the 
probability of success for a given ability level by the 
proportion of total scores found at that ability level, then 


summing across all daoriity levels, i.e, 


i a un | P56 (0) % P79 (8) | = : = ioe 
G=—3 2 56 Wy 
where: 
P, (6) = the probable difference of item score. 
between the 1956 and 1977/7 Students on 
item i; 
Peo (6) = ENC PEObaDMiuLyY (Om success 10) 1956 ston a 
SLUGCeCMer ae ta Partaculan ability level 16); 
P55 (6) =SEhe probant ley (Oh ssuccess 1m 1977 bor a 
StCudengt atea  pabtzculanr abilaty level 
Nee = the number of 1956 examinees having the 
ability level 6; 
N55 = the number of 1977 examinees having the 
ability level 6; 
Nee = the total number of examinees for 1956; 
N = che total number of examinees for 1977. 
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Those items that had a probable difference of item scores 
greater than or equal to 0.05, 0.10, or 0.15 were identified 
as being aberrant. 

Imeorder to determine ther difterences in’ abilities 
between the 1956 group of examinees and the 1977 group of 
examinees, the data from both study years were pooled and 
subsequently analyzed by LOGIST. Because the resulting 
abilities were then on the same scale, they were directly 
comparable. Consequently, the abilities from the 1956 
students and the 1977 students were identified and the mean 
and standard deviation of the respective abilities were then 
computed for each study year for each test. The difference 
between these means represents the mean ability difference 
between 1956 and 1977 on any given test. 

In order to determine the effects of the aberrant items 
at each difference level, the above procedure was repeated 
after the appropriate aberrant items were removed. Thus, 
mean student ability differences between the 1956 students 
and the 1977 students were obtained under the following 
conditions: 
is when no test items were removed, 
as when those test items that were aberrant at the 2.15 


difference level were removed, 
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Bee when those test items that were aberrant at the 
difference level were removed, 


4. when those test items that were aberrant at the 2.05 
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difference level were removed. 

ASva™ technical aside, 1t should be notedythat LOGIS: 
deletes all those students having a 100% score from the 
sample before making its calculation of abilities. 
Consequently, all 100% scores were re-introduced as an 


ability of +4.00 before the mean differences of ability were 


calculated. 


PVG RESuiets 


The results chapter will be presented in five sections; 


one section for each test. The discussion of the results for 


each test takes place in the following chapter. Within each 


section the following information is presented: 


ie 


The sample size of the 1956 Grade III study and the 
sample size of the 1977 Grade III study for the 
selectedq test. 

When the test numbering sequence differs from the 
numbering sequence used in the analysis procedure, a 
table showing the relationship between the numbering of 
the original test and that used for the analysis 
procedure is presented. 

The item parameters for the selected test as calculated 
by 16 iterations of LOGIST are presented for the 1956 
Grade III data and the 1977 Grade III data. 

The number of students that had obtained a particular 
ability level in 1956 and in 1977 are then presented. 
It should be noted that the sum of all the students 
across the ability levels -3.00 to +3.00 is less than 
the total number of students included in the analysis. 
This is because only the abilities between -3.00 and 
+3.00 were selected for the purpose of plotting the 
item characteristic curves. This was done in order to 


eliminate those instances where the results were poorly 
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determined due to the lack of subjects at the extreme 

values of ability. The difference then, between the 

total N calculated in this manner and the total N 

reported at the beginning of each test section is due 

to LOGISTsecalecuillat lon of sabi | 121es “outside stheurange 

OPE 3 a00 Sta ee e00: 

The rescaled item parameters for the 1956 data and the 

1977 data for the selected test. This allows for the 

calculation of the item characteristic curves to be on 

the same scale, and, thus, be directly comparable. 

The probable difference of scores between the 1956 

students and the 1977 students for each item on the 

selected test are presented. 

The average abilities of the 1956 Grade III students 

and the 1977 Grade III students are presented under the 

following conditions: 

a. for a set of test items that was constructed by 
removing all of the aberrant test items that had 
an item probable difference 2 0.05.; 

le for a set of test items that was constructed by 
removing all of the aberrant test items that had 
an item probable difference 2 0.10.; 

Gi for a set of test items that was constructed by 
removing all of the aberrant test items that had 
an item probable difference 2 0.15.; 

qa for the complete test, i.e., where no items have 


been eliminated prior to analysis by LOGIST. 
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Within each condition, the students that 
obtained a score of 100% are removed by LOGIST 
before computing the overall ability scores. Thus, 
the students who obtained a score of 100% were 
assigned an ability level of 4.00 before the 
ability means were calculated. 

8. The item characteristic curves for the 1956 data and 
the 1977 data are presented together for each selected 
test item that was included in the standardization 
procedure prior to the rescaling of the item parameters 
obtained from the 1977 data. This was done in order to 
facilitate the comparison of the performance of the 
1956 students and the 1977 students on a given item 
across all levels of student ability. 

Raven Progressive Matrices Test 
The data from 3,596 students who were administered 

Raven’s Progressive Matrices Test in 1956 were analyzed 

along with the data from 2,577 1977 students. Table 1 

presents the study item numbers used in the analysis and the 

related test item numbers. The 1977 sample represents 
approximately 60% of the Grade III students enrolled in the 

Edmonton Public School system in 1977. The remaining 40% of 

the 1977 population was not administered the Raven’ s 

Progressive Matrices Test. These students were randomly 

selected to receive an alternative test for a companion 

study that was being administered at the same time as the 


Grade III Achievement study. 
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The item parameters from the 1956 student data and the 
1977 student data, when analyzed separately by LOGIST, are 
presented in Table 2. Table 3 presents the rescaled item 
parameters. The number of students found at each of the 
specified levels of ability are presented in Table 4. 

The probable difference of scores between the 1956 
Grade III students and the 1977 Grade III students for each 
test item are presented in Table 5. The results from the 
aberrant test items that were identified in Table 5 for each 
of the different aberrant levels were removed from the total 
test data. The selected data were then pooled across the two 
study years and were reanalyzed by LOGIST. This was done in 
order to obtain new student abilities which did not include 
the results from the aberrant items. Mean abilities were 
then calculated for both the 1956 Grade III students and the 
1977 Grade III students at each of the specified levels of 
aberrance and are presented in Table 6. The item 
characteristic curves for the Raven Progressive Matrices 
Test are presented in Figures 2 to 26. 

Gates Paragraph Reading 

The data from 3,569 students who were administered 
Gates’ paragraph reading Test in 1956 were analyzed along 
with the data from 4,430 1977 Grade III students. The test 
item number of the Gates Paragraph Reading test and the 
corresponding Grade III Achievement Study numbers are 
presented in Table 7. 


The item parameters for the 1956 data and the 1977 data 
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Table 1 
Study Item Number Assignment for Raven's Progressive 


Matrices Test 


Raven(1947) 1956 1977 Raven( 1947) 1956 1977 
Item # Study # Study # Item # Study # Study # 
Avi 1 1 Ney 19 1 

2 2 2 8 20 20 
3 3 3 9g 24 21 
4 4 4 10 a2 22 
5 ae. 5 ta 23 23 
6 6 6 12 24 24 
7 7 il 
8 8 8 S| 25 25 
c) g 9 2 26 26 
10 10 10 3 2 27 
11 11 ti 4 28 28 
V2 12 11 5 29 29 
6 30 30 
Ab 13 13 ih =)) ar 
2 14 14 8 o2 o2 
3 iis 1S g 33 cs) 
4 16 16 10 34 34 
2) vi etl le 35 35 
6) 18 18 V2 36 36 
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Item Parameters for Raven's Progressive Matrices 
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Table 3 


Rescaled Item Parameters for the Raven Progressive Matrices 


1956 Loi 
ITEM # a b S a b S 
1 N/A N/A 
2 N/A N/A 
6 N/A N/A 
4 N/A N/A 
% N/A N/A 
6 N/A N/A 
i OR 4G See ereoe 03045 OSG) Seal 02045 
8 O02 899922959 0.045 OE sa) 02045 
g ee ee eiays 07045 OSG aa ss iays OF 045 
10 Chziwasy 2 Oe! OES Oa =) a HSHO) QO2045 
1 sews iO) Q2045 0585 (22936 02045 
12 0.492 0.432 OOS e595 1.086 02045 
ie N/A N/A 
14 N/A N/A 
15 N/A N/A 
16 0/38 === 05980 02045 OR60 Sim = 1461077 02045 
Vi Onis “SO Cres 0.045 OS C4i ee aoe 0.045 
18 G2996u =05963 6.045 Oe) mths Vee O2045 
13 Oo Oa an 049 02045 UBG Aimee oe O2045 
20 Uo 0-655 0045 OF6OS Sen 0e2o4 02045 
21 One 17 07268 0.045 Qe ae 2 Oe On 0.045 
oe O55) me One 0 0.045 OMS eee Us oC 02045 
we 0), Sas eon On 045 OF462) 9=0-.012 ORO4s 
24 1.939 1.068 0.056 1.770 Po SOLS 02056 
25 N/A N/A 
26 N/A N/A 
2. eeteners) RGR) C2045 OO. w= 23 OS O7045 
28 OF S38 ie ae On045 Of 6 Gawain CO On 045 
29 07908-07400 0045 CLV -st6) eisxe} On045 
30 1085 (Oh Alleys: 0.068 ORG? jm ea 0.068 
oc Omics 0.419 07045 Oca ORR 0.045 
32 1.939 G7.25 OF045 eee ae Ome Siva 0.045 
So 1.939 07586 0.045 eae Ona 15 0.045 
34 129S9 0.482 0.090 Ae ee 0.083 0.090 
3 asker O27 09 02029 ee 07395 0.029 
36 cee eos O2016 ey ou 02016 


N/A - This item was not included due to an extreme original 


difficulty parameter. 
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Table 4 
The Number of Students Located at Each Level of Ability for 


the Raven Progressive Matrices Test 


ADIT IEY IShele STE: Ability 1956 SE ie 
oto a ONG ae S OF20 So iS) 
Secu 29 14 0.40 291 234 
ley 34 ZO 0.60 245 ASS) 
-2.40 30 24 0.80 267 234 
e220 46 ee 1200 240 107 
ce 200 43 Se pe!) 194 189 
= 15.00 66 46 1.40 Oy 134 
i edend) 70 56 1 G0 a)! oy 
= 40 89 30 180 23 96 
eae, 0 114 Sy Zod 16 75 
Beau 1S 63 220 8 49 
702.80 ee 66 2.40 - eZ 
SO OU tiled 49 2.60 dal 5) 
-0.40 12 LG 2/30 + 16 
woe 225 149 SOU 1 eS) 

0.00 Ola EGL 


* The 1977 distribution following the rescaling procedure 
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Table 5 
Aberrant Items at Specified Levels of Difference 


Raven's Progressive Matrices Test 


Item # Deis hhe a0) 20h) 20210 =e Oeako 


1 N/A 
2 N/A 
S N/A 
o N/A 
S N/A 
6 N/A 
7 070365 
8 0.0768 = 
3 Om0 ea, 
10 0.0428 
1 OR0638 As 
12 Omizog a: 8 
13 N/A 
14 N/A 
ie N/A 
16 0.0645 * 
ies 0.0062 
18 0.0621 * 
RS) 0.0258 
ZU 0.0862 - 
2 0.0714 * 
22 050228 
23 O05 12 * 
24 0.0514 * 
25 N/A 
26 N/A 
2 One 122 
28 0.0348 
as CAGE ihe) 
30 Ose itsts) + 
31 On07 93 fs 
oe OOS) ss 
oe 0.0586 - 
34 OR250 * 
a1) Or 0v9 8 * 
36 0.0647 f 


N/A - This item was not included due to an extreme original 
difficulty (b) parameter. 
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Table 


Mean Student Abilities for Rav 


Level of 1956 1977 1 
aberrance students’ students 


mean = -0.204 OF276 
THO 
S80 te=ne. 00 1.189 
OanS Same as above 
mean = -0:.168 0.259 
0.10 
Sra a= Bae ool W easel 
meane =F) = 14087 loa6 
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Ssom suac. 0s9 2 he 


Ne.=7 33596 Eyl) Uh 
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en’s Progressive Matrices Test 


956+1977 # items # items 
combined removed remaining 


-0.004 
0 36 
1.140 
0 36 
O00 
4 a2 
Vien Si) 
eS) 
16 20 
2208 
Stes 


“ 5 a 7 Ps : ; a e 
T< 4 + : 
7 ie owe = we a 

>» _ _ i. 
ga a aie a 


“oe iad! 8° PODS 
reg 


PY 


50 


when analyzed separately by LOGIST are presented in Table 8. 
Table 9 presents the rescaled item parameters. The number of 
students that were located at each specific level of ability 
are given in Table 10. 

The probable difference of scores between the 1956 
Grade III students and the 1977 Grade III students for each 
test item are presented in Table 11. Tne resuits from the 
aberrant test items that were identified in Table 11 for 
each of the different aberrant levels were removed from the 
total test data. The selected test item data were then 
pooled across the two study years and reanalyzed by LOGIST. 
This was done in order to obtain new student abilities which 
did not include the results from the aberrant items. Mean 
abilities were then calculated for both the 1956 Grade III 
students and the 1977 Grade III students at each of the 
specified levels of aberrance and are presented in Table 12. 
The item characteristic curves for Gate’s paragraph reading 
test are presented in Figures 27 to 49. 

Gates Word Recognition 

The data from 3,540 1956 Grade III students were 
analyzed along with the data from 4,430 1977 Grade III 
students. Table 13 presents the study item number and the 
related test item number. The item parameters from the 1956 
data and those from 1977 data, when analyzed separately by 
LOGIST, are presented in Table 14. The rescaled item 
parameters are presented in Table 15. The number of students 


located at specified levels of ability are presented in 
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Table 7 


Study Item Number Assignment for Gates Paragraph Reading 


Item # Study # Item # Study # 


1 1 Wa 17 
2 2 ide) 18 
3 3 18a 19 
4 4 18b 20 
5 5 19a oF 
6 6 19b 22 
fi 7 20a 23 
8 8 20b 24 
g 9g Did 25 
10 10 21b 26 
val ihe 22a Di 
12 12 22b 28 
ES) 13 esa 29 
14 14 23. 30 
15 iS) 24a 31 
16 16 24b Bie. 
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Item Parameters for Gates Paragraph Reading 
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Rescaled Item Parameters for Gates Paragraph Reading 


1956 
ITEM # a b S 
1 N/A 
2 N/A 
3 N/A 
4 N/A 
S N/A 
6 N/A 
ii DE OR aa caer) o 
8 Ges 3a OF 562 OF 
9g N/A 
10 deer = Ba ee! On 
14 N/A 
ne. OSS  =as i lores 0-. 
ie: O46 5 Pease 4 Gee 
14 Op Gigeee 6 12 OF 
hs OFC eee eos 0. 
16 0.468 =i. 683 On 
17 O27 4s - 04199 OF 
18 N/A 
19 OOS Seat cos OR 
20 05639) -07392 0. 
oa 2 rUUO aU aoe S Os 
22 ls eeiss  SOeelere) OF 
a 122 49 030 02 
24 ORG So ean 0: 
20 Ob Wests) 1060 Oe 
26 Oye oe ok 
2th 2.000 Or loner! (Oy 
28 i) OS 0.926 OF 
29 OE hs le ON Ox 
30 22000 0.953 Ore 
ou 2.00 O22 0. 
2 O45 1.436 0 
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N/A - This item was not included due to an extreme original 


difficulty parameter. 
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Table 10 
The Number of Students Located at Each Specified Level 


of Ability for the Gates Paragraph Reading test 


Ability 1956 eS Tees AD acy, PSI) ees 
soy Oe) i 6 Gh AG Bae Sy 
727 OU ZO 14 0.40 292 349 
=r 00 28 20 0.60 255 320 
-2.40 Zo 24 0.80 238 35 
oe 0 31 Os Oe 207 320 
S200 40 47 peas 168 156 
=. CU ae: Te 1.40 it2 2605 
valedeye) 63 2) lol 63 208 
-1.40 Tel 120 aa’ 38 116 
ried 110 126 200 20 63 
=. 00 150 154 ae 24 50 
4050 He 205 2.40 10 45 
aoU 209 219 2.60 7 eZ 
-0.40 247 241 2200 - i 
20520 249 128 6) Ch) 1 13 

0.00 285 286 


* The 1977 distribution following the rescaling procedure 
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Fabte. 41 
Aberrant Items at Specified Levels of Difference 


Gates Paragraph Reading Test 


Item # Dorit Se Om0s Se ORO Ol AS) 
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N/A - This item was not included due to an extreme original 
difficulty (b) parameter. 
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Table 12 


Mean Student Abilities for the Gates Paragraph Reading Test 


Level of 1956 1O77 1956+1977 # items # items 
aberrance students students combined removed remaining 


meant nour ter G20 =) #OSe 
Had 0 32 
SmiCy, wae OS 1164 eae! 
mean’ b=—-02133 0.006 -O7 056 
Ort 5 1 31 
Sta. a=) eto laieg hes 
mecha =a Oeaod 0.002 =On079 
Ord 3 29 
CeCe = ee oO hee 14 1.209 
Mmedahla= @- 0 false 071057 Oey, 
O805 8 oa 
etek see) ita iene: 12 ey iP wecis) 


Nig e 5009 4430 Skis 
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pal | a 4 - * 
jira: 


Table 16. 

The probable difference of scores between the 1956 
Grade III students and the 1977 Grade III students for each 
test item are presented in Table 17. The results from the 
aberrant test items identified in Table 17 for each of the 
different aberrant levels, were removed from the total test 
data. The selected test item data were then pooled across 
the two study years and reanalyzed by LOGIST. This was done 
in order to obtain new student abilities which did not 
include the results from the aberrant items. Mean abilities 
were then calculated for both the 1956 Grade III students 
and the 1977 Grade III students at each of the specified 
levels of aberrance and are presented in Table 19. The item 
characterisic curves for Gate’s Word Recognition Test are 
presented in Figures 50 to 88. 

California Mental Maturity 

The data from 3,443 1956 Grade III students and the 
data from 4,378 1977 students were analyzed using LOGIST. 
Table 19 presents the study item numbers and the related 
test item numbers. The item parameters calculated from the 
1956 data and those calculated from the 1977 data when they 
were separately analyzed using LOGIST are presented in Table 
20. The rescaled item parameters are presented in Table 21. 
The number of students found at each specified level of 
ability are presented in Table 22. 
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Table 13 


Study Item Number Assignment for Gates Word Recognition 


Item Word Study # Item Word Study # 


apple 1 stumble 20 
Peer 2 Cha rp 26 
bread 3 razor ay 
fire 4 mask 28 
woman 5) shark 29 
village 6 study 30 
dinner 7 fierce 34 
stop 8 temp le 32 
orange g military 33 
lumber 10 gypsy 34 
string pa musician 35 
Knee 12 wrestle 36 
forest 13 dwelling 37 
sword 14 weapon 33 
orehara 15 s lumber 39 
insect 16 admiral 40 
grocer 17 meda | 4} 
hatchet 18 arbor 42 
merchant 19 garrison 43 
anchor 20 dormitory 44 
onion 21 chandelier 45 
slipper ee equestrian 46 
arrow 23 pugelist 47 
veil 24 rhythmic 48 
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Table 14 


Item Parameters for Gates Word Recognition 
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Rescaled Item Parameters for Gates Word Recognition 
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Table 15 

1956 
ITEM # a b © 
40 2000 0.930 O26 
41 leh ea @. 4641 05240 
42 1.896 (Oh este Or250 
43 2.000 eed O2145 
44 (Oasis! Ons22 0.240 
45 ORs s Cer 0.240 
46 Cua Bp aose G2 a6) 
47 N/A 
48 (Oe years tot lie, 0.240 


Veom by 


(=) Sole Sere 
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N/A - This item was not included due to an extreme original 


difficulty parameter. 
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Table 16 
The Number of Students Located at Each Level of Ability for 


the Gates Word Recognition Test 


Ability 1956 1977% Ability 1956 1977% 
-3.00 6 ; 0.20 225 307 
=2,80 g 1 0.40 219 308 
-2.60 8 0.60 257 431 
-2,40 16 3 0.80 267 241 
-2,20 29 5 1.00 210 197 
-2.00 44 14 20 171 289 
-1.80 51 19 1.40 Rie 156 
-1.60 81 29 1.60 72 83 
-1,40 109 48 1.80 39 90 
~1.20 146 118 2.00 29 51 
-1.00 179 146 D900 16 50 
-0.80 222 cy 2.40 10 34 
-0.60 259 286 2.60 11 11 
-0.40 260 321 2.80 14 
-0.20 227 310 3.00 

0.00 205 455 


* The 1977 distribution following the rescaling procedure 
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Table 17 
Aberrant Items at Specified Levels of Difference 


Gates Word Recognition Test 


Item # BB etic nO) 5 20510) ee REYES) 


1 N/A 
2 N/A 
3 N/A 
4 N/A 
5 N/A 
6 020543 * 
i N/A 
8 N/A 
g N/A 
10 GO 0139 
14 OF 0213 
12 ORO T27 
ke: 00053 
14 070078 
15 0.0299 
16 0.0070 
17 Om mz * 
18 OF 0327 
19 050374 
20 0.1409 * * 
2 1 0.0396 
ie OF 0210 
ne 0.0096 
24 OM10SS * * 
25 OF 0455 * 
26 0.0749 * 
Df 0.0085 
28 0.0048 
29 OP Oe!) * * 
20 0.0238 
31 0.06093 * 
So 0.0248 
33 0.0490 * 
34 020943 * 
25 0.0498 * 
36 OFOone * 
$i 0.0838 * 
38 0.0395 
39 0.0899 * 
40 0.1209 * * 
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Item 3 Sea Ge Ene a) a 2 Oetd 
41 On 0/59 * 

42 Om05 34 * 

43 OmG551 * 

44 O20454 * 

45 Oy ree * * * 
46 O£0143 

47 N/A 

48 0.0945 * * 


N/A - This item was not included due to an extreme original 
difficulty (b) parameter. 
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Table 18 


Mean Student Abilities for the Gates Word Recognition Test 


Level of 1956 1977 195641977 
aberrance students students combined 


mean = -0.167 Ome 4 OROE0 
(eae 
S-Cee= meee 1.060 1.149 
Meahiac= ss rs 0 0.208 On 0:57 
Oe ais) 
See see 240 1.060 SS 
mean = -0.033 OPZo8 OeeteH 
On01/0 
Seuss oe pa eels sf alleys: i258 
mean. =) 0203/7 02276 OF 178 
O05 
eC = OS 12 Oo 1.296 


N = 3540 4430 (EWES 


# items 
removed 


2) 


# jtems 
remaining 


48 


47 


42 


27 


ied 
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test item are presented in Table 23. The results from the 
aberrant test items that were identified in Table 23 for 
each of the different aberrant levels were removed from the 
total test data. [he selected test ifem data were then 
pooled across the two study years and reanalyzed by LOGIS7. 
This was done in order to obtain new student abilities which 
did not include the results from the aberrant items. Mean 
abilities were then calculated for both the 1956 Grade III 
students and the 1977 Grade III students at each of the 
specified levels of aberrance and are presented in Table 24. 
The item characteristic curves for the California Mental 


Maturity Test are presented in Figures 89 to 147. 
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Study Item Number Assignment for California Mental Maturity 
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Item Paramaters for California Mental Maturity 
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Table 21 
Rescaled Item Parameters for California Mental Maturity 
1956 17 
ITEM # a b Cc a b o) 
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ITEM # a b c a b S 
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63 OS 39 See eats C220 OV 364 "31,047 QRZ 0 
64 OQ) Siler) Se ers 0.210 OnAe 2 Oe 645 QRZ 10 
65 igoos 12867 0.091 1495 Cin Orstc) 0203.4 
66 N/A N/A 
on N/A N/A 
68 N/A N/A 
69 0.486 0.688 2210 O87 92 1.449 210 
70 OL ee 1.887 On 1.199 oh AS 0.248 
7 1 N/A N/A 
he N/A N/A 
(s N/A N/A 
74 N/A N/A 
75 N/A N/A 
76 N/A N/A 
Ti N/A N/A 
78 N/A N/A 
79 N/A N/A 
80 OR etelss Shen On Ole Weee 2) TRS) 0220 
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hABeEE2O CONT) 


[S36 Sis! 
ITEM # a b GC a b Cc 
82 N/A N/A 
83 N/A N/A 
84 N/A N/A 
85 N/A N/A 
86 N/A N/A 
87 Ometsiet S309) e210 0.224 Pe Ske Drea 
88 N/A N/A 
89 OF SOM meaoeero © O20 Oromo: Pol Or 
90 Ome wee UiaisS 21.0 OF4/3> Fae 2166 OF 
oH N/A N/A 
G2 N/A N/A 
se Orc OBaee Uae tec Oe 2c Ol o2 0.184 05240 
94 0.411 3.810 02087 Oh Shans) Seas 0.087 
os OF Ue ne 0.210 02230) 20.7614 O20 
96 0.830 Zoo 0229 1 0.294 1.2360 Cee os 
oi ORO yal 1.468 On 210 Oh G2 702 OR 20 
98 N/A N/A 


N/A - This item was not included due to an extreme original 
difficulty parameter. 
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Table 22 
The Number of Students Located at Each Level of Ability for 


the California Mental Maturity Test 


Ability D956 SWE: Ability poo 1977* 
eo 0U fi 8 Oa20 7p a 267 
=2.80 ig 2D 0.40 367 136 
72.60 28 tae 0.60 38 1 303 
Sea 32 oo 0.80 229g Bers: 
ee 46 23 1.00 194 340 
=e) 43 25 1220 vie 207 
raat 64 ffs 1.40 123 383 
e560 74 55 P2650 64 160 
-1.40 105 ai 1.80 34 oo 
2.20 96 148 2.00 14 131 
=i 00 115 17 1 22 = 179 
as ou 162 Ga 2.40 5 129 
-0.60 aa A. 26 2.00 3 4a 
-0.40 263 190 2.80 = 44 
mas ine! 205 196 3.00 2 Bel 

0.00 160 om 


* The 1977 distribution following the rescaling procedure 
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Table 23 


Aberrant Items at Specified Levels of Difference 


Item # 
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California Mental Maturity 


Dutt. pay Ne O18 2 On 40 pa 8 Las is) 


. 1948 
.2008 
SPE 
e2050 
.0600 
N/A 
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moe 
. 2364 
.2014 
"0039 
.0048 
.0051 
.0058 
O40 
0147 
20095 
.0087 
e0257 
50S) 1 
.0140 
205S0 * 
.0034 

20039 

.0061 

N/A 

UL ESis 
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.0056 

. 0096 

.0143 
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Table 23 (con’ t) 
Item # Diee Zz uOE OS mae Oe) 2 Wits 


4} 00707 si 
42 0.0249 

43 N/A 

44 N/A 

45 0.0432 

46 0.0540 * 
47 N/A 

48 N/A 

49 N/A 

50 N/A 

ea N/A 

a) N/A 

53 0.0834 * 
54 0.1042 * * 
She) 0.0443 

56 OF0095 

ai 0.0542 * 
58 N/A 

Ss, GSO 133 

60 N/A 

61 N/A 

62 0.0366 

63 0.0688 aE 
64 0.0428 

65 0.0910 7 
66 N/A 

67 N/A 

68 N/A 

69 0.1404 * is 
70 00921 * 
(a N/A 

ee N/A 

UE N/A 

74 N/A 

Us N/A 

76 N/A 

WA. N/A 

78 N/A 

79 N/A 

80 0.20312 

81 N/A 
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Tab le=23 (con't) 


Item # Ditt Ze Om05 202 10 a eae 
83 N/A 

84 N/A 

85 N/A 

86 N/A 

87 Cred 

88 N/A 

89 00967 * * 

90 OF 1803 a * = 
91 N/A 

G2 N/A 

Jo 0.0274 

94 00573 a; 

a2 0.0918 * 

96 Ore 7 05 * + 
ey O20326 

98 N/A 


N/A - This item was not included due to an extreme original 
difficulty (b) parameter. 
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Table 24 


Mean Student Abilities for the California Mental Maturity 


level of 1956 rOo7 7 1956+1977 # items # items 
aberrance students students combined removed remaining 


medias) =) 0 ss20 Cie ors =C3040 
10 0 98 
Sere Ooo RON emer ae 
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Omi 9g 89 
Se), SO) isis 1.099 1.040 
ineela  o =10) etsy 02244 0.019 
a) 13 85 
Sige, So ier te a ee 12040 
recta) 3 S10, ees: 0.229 0.024 
0705 25 as 
shale Se = SU seeel tele te bs OWES: 
Noa 443 4378 (824 
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V. DISCUSSION AND CONCLUSIONS 


The discussion will be presented in the following 


manner - 


I 


an introduction that will discuss in general terms the 
way in which the results were reviewed; 

a discussion of the results of each of the specific 
tests; 

a comparison and discussion of the implications of the 
item statistics that were computed by latent trait 
theory methodology verses the implications of the item 
statistics that were computed by classical test theory 
methodology; and 


a concluding summary. 


Introduction 


Each item on each test falls into one of the following 


categories: 


i 


The item was very easy for both groups of students. 
Hence, there is no difference in the pattern of student 
response between the two study years. 

The item had similar item characteristics curves for 
both study years. This would indicate that students 
having a similar ability in either 1956 or 1977 had an 
equal probability of getting this item correct. 


Therefore, this item was not biasing the results toward 
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either group of students for any one study year. 

The item produced different item characteristic curves 

for each study year. This would indicate that persons 

having similar ability levels in 1956 and 1977 may not 
have had the same probability of getting the item 
correct. For some reason, then, this item would appear 
to be measuring a different trait(s) over all student 
abilities or some portion of the student ability 
continuum. A condition of aberrance then exists for 
this item. The degree of aberrance would indicate the 
degree of item bias that exists. Depending on the 
conditions that prevail, the bias may be toward either 
the 1956 student or the 1977 student. 

There are four ways that the item parameters obtained 

from the 1977 students may differ from the item 

parameters obtained from the 1956 students. 

ay, The discrimination (a) parameter is less in 1977 
thane vt Was linmeioooeand the ditticulty (bo) 
parameter is less as well. 

eee The discrimination (a) parameter is less in 1977 
Liane ite wase iio oohebU tet ine ciithid Clbliy by) 
parameter is higher. 

Cr The discrimination (a) parameter is higher in 1977 
thane itewassine O50 andsthe citticulty (bp) 
parameter is lower. 

d. The discrimination (a) parameter is higher in 1977 
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parameter is higher as well. 

The aberrant items will be discussed in terms of 
these four categories in order to determine if there 
was any consistent bias pattern among the aberrant 
items. 

In order to ascertain the degree of item aberrance, it 
was necessary to have a measure of the reliability of the 
item characteristic curve. The true error of this curve can 
be established by determining the probability of success for 
each value of ability as predicted by the model and then 
using the data to assess the actual degree of success for 
each student at each ability level for each item. Because of 
the expense of approaching the problem of standard error in 
this manner, it was considered that a pseudo standard error 


could be calculated in the following manner: 
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The estimated standard error is indicated on each Figure: 
depicting the relationship between the item characteristic 
curve, obtained from each study year for each item, and the 
estimated standard error. This is included only as a rough 
guideline to standard error and is not intended to represent 


the actual error between the model and the data. 


B. Discussion of Individual Test Results 

Raven Progressive Matrices Test 

BeCwisHlmecue Soe4e no the on ei4 7 5) 255 eand 2Zomwere 
identified in Table 2 as having a difficulty (b) parameter 
that was less than -4.00 as calculated from either the 1956 
or the 1977 data. The item characteristic curves for these 
items would take the form of Figure 1. All of these items 
were found by the students to be so easy that there was 
little or no discrimination between the high ability 
students and the low ability students. This means that 
nearly all the students from both study years answered these 
items correctly. This result clearly reflects Raven's test 
format. Items 1 to 12 are from Raven's subtest A which is 
the easiest subtest within this test. Items 13 to 24 are 
from Raven’s subtest Ab, which was to be slightly harder 
than subtest A. items 25 to s6uare from Ravens subtest 6; 
the hardest of the three subtests. The first few items in 


each subtest were found to be easy by the students of both 
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study years and, consequently, these items demonstrate no 
bias toward either group of students. 


rhe characterist res curves “for items 7,69. 10) 17 


ait 


oe 
Doe ta? Oana -2 9a OuUrese2 45. 951i. (4g eae 19 
respectively) indicate that these items performed in a 
similar manner when administered to both study years; i.e., 
a student of a given ability level had an equal probability 
of answering one of these items correctly, whether or not he 
belonged to the 1956 group of students or to the 1977 group 
of students (see Table 5). 

The item characteristic curves (Figures 1 to 26) in 
conjunction with the results presented in tables 5 and 25, 
form the basis of the following discussion of the location 
eh where the achievement differences between the 1956 and 
the 1977 students occurred when the examinee ability 
continuum was roughly divided into three parts: low (6 < 
Seemed Um -ie< eo) +t )eeand high o> 8 ll 

Table 5 indicates that items 12, 30, 34, and 35 had an 
estimated item difference of at least 0.10. Figure 20 shows 
that for item 30, the probability of answering this item 
correctly was higher for the low- and medium-ability 1977 
students than for the 1956 students of similar ability. 

Figure 24 shows that for item 34, the medium-ability 
students in 1977 had a higher probability of answering this 
item correctly than did the students of similar ability in 


1956. The low- and high-ability students in 1977 had the 


same probability of success as their ability counterpart 
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Location of Item Bias by Ability Level 


Raven's Progressive Matrices Test 


Ability Level 
1956 (S77 
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item # low med. high low med. high 
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Hao lees: VeCcon atl 


Ability Level 


1956 Site 
item # low med. high low med. high 
30 * * 
| * * 
ae 
Sc = 
34 > 
35 * 
36 * 


* Indicates those areas in the ability continuum where the 
students from a given study year had a higher probability 
of getting a given item correct than did the students of 
a similar ability level from the opposing study year. 
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1956 students. A similar pattern to that reported for item 
34 (Figure 24) is visible for item 35 (Figure 25). Figure 7 
shows that for item 12 the 1956 medium-ability students had 
a higher probability of answering this item correctly than 
did the 1977 medium-ability student. 

fablens snows (thath items ss, Vile 169578207527. 23 24% 
Sie 32; S8)¥and 3s6°had an estimated item difference of at 
least 0.05. There does not appear to be any single pattern 
of difference among these items between the time they were 
first administered in 1956 and the time when they were 
readministered in 1977. 

Figure 8 (item 16) indicates that low-ability students 
in 1977 had a higher probability of answering the item 
correctly than did the low-ability 1956 students. The higher 
ability students of both study years appeared to have an 
equal probability of answering this item correctly. 

Figures 10 and 12 (items 18 and 20) show a similar 
pattern where the medium-ability students in 1956 had a 
slightly higher probability of answering these two items 
correctly than did the 1977 students. This also appears true 
for item 11 (Figure 6) except that the high ability subjects 
may be affected as well. The reverse is true for item 23 
(Figure 15); i.e., the medium-ability students in 1977 have 
a higher probability of answering this item correctly than 
did the 1956 medium-ability students. However, the pseudo 
standard error pattern shown in Figures 6 and 15 may 


indicate that even though there is an estimated item 
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difference greater than 0.05, this difference is distributed 
across a wide range of abilities and may not, in fact, be 
Significantly biasing any student located at any specific 
ability level in either study year. 

Figure 3 (item 8) shows that the 1956 students had a 
higher probability of answering this item correctly than did 
the 1977 students over all levels of abilities. The ability 
for this item to discriminate well among the students of 
different ability levels for either study group is, however, 
quite low as is indicated by the overall! flat curve. Because 
of this, the usefulness of this item could be questioned. 

migunes 1G. e225 rand 23hindicate that for. items 24,32, 
and 33, the medium-ability 1977 students had a higher 
probability of answering these items correctly than did the 
1956 students. 

Figures 16 and 26 indicate that for items 24 and 36, 
the high-ability 1956 students had a higher probability of 
answering this item correctly than did the 1977 students. 

Figure 21 indicates that for item 31 the low-ability 
1956 students had a lower probability of answering this item 
CoOnreculy than didrethesi977 students and; Ttupther to inis, 
the high-ability 1956 students had a higher probability of 
answering this item correctly. 

The different achievement means were reported in 
Table 6. If all 36 items are included in the students’ 
evaluation, then the mean ability difference between the 


1956 students and the 1977 students is 0.48 ability units or 
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roughly a half a standard deviation unit on the ability 
scale in favor of the 1977 students. This difference 
indicates that the 1977 students were able to answer Raven's 
test items more proficiently than did the 1956 students. 

However, Table 5 indicated that when the Raven 
Progressive Matrices test was administered, not all the test 
items were measuring the underlying latent trait(s) in the 
same manner for each study group. 

When the data from the items that had an estimated item 
difference of 0.10 or greater were removed before the mean 
achievement difference score was calculated, the difference 
between the two study groups went down to 0.42 ability units 
in favor of the 1977 group. The decrease of the mean 
achievement difference from 0.48 to 0.42 ability units would 
indicate that those items that were removed were slightly 
favoring the 1977 group of students. 

When the data from the items that had an estimated item 
difference of 0.05 or greater were removed prior to the 
calculation of the mean achievement difference between the 
two study groups, this score equalled 0.46 ability units in 
favor of the 1977 students. The fact that the mean 
achievement difference score has remained almost constant 
across all conditions of aberrant item removal would suggest 
that some of the aberrant items in the test favored the 1956 
students while others favored the 1977 students. By 
reviewing the item characteristic curves (Figures 2 to 26), 


it can be observed that there is no consistent pattern of 
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aberrance in favor of either the 1956 group or the 1977 
group Mel temsecer la ealceeets 20, 24) Ste and: So eindicate ‘that 
the 1956 students had a higher probability of getting these 
items correct. The 1977 students had a higher probability 
than the 1956students of getting items 16, 21523, 30,932, 
Sous 4 wand -35-correct: 

Table 26 presents a summary of how the 19877 item 
parameters differ in relation to the i956 item parameters 
for each of the aberrant items. A chi square of 0.41 (df=1) 
further verifies that there was no consistent item parameter 
difference pattern between the two study groups. 

The mean ability difference of 0.46 would further tend 
to indicate that there were fewer 1977 low-ability students. 
Thus, the probable difference of scores for any niveat ton 
May not have a significant impact if it is favoring the 1956 
low ability students. 

The mean achievement difference of approximately 0.5 of 
a standard ability unit, becomes even more significant when 
the mean ages of the two study groups are taken into 
account. The mean age of the 1956 students reported by 
Clarkeweg al (19/ i sp we 0) wase 106.0 monthsei N= o>o43- 
S.D. = 7.37), while the mean age of the 1977 students was 
L0Geo femonths s0N==04 4/206) De 550) eRaven™ 61965) report s 
that as a child gets older he is expected to perform better 
on the Progressive Matrices test. Thus, the fact that the 
1977 students had a higher mean ability than did the 1956 


students at an earlier age indicates that the 1977 students, 
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Table 26 


The Relationship Between the 1977 Aberrant Item Parameters 


and the 1956 Aberrant Item Parameters for Raven's 
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on the average, are more intelligent than the 1956 
students. / 
Gates Paragraph Reading Iest 

Plemseasje2, “oye 6,-9.6 Iicsand 46ewere adentified 
from Table 8 as having a difficulty (b) parameter that was 
less than -4.00 as calculated from either the 1956 or the 
1977 data. The item characteristic curves for these items 
would take the form of Figure 1. The students from both 
study years found these items to be easy. Consequently, 
these items were not found to be biased toward either the 
1956 students or the 1977 students. 

The item characteristic curves (Figures 27 to 49) in 
conjunction with the results presented in tables 11 and 27, 
form the basis of the following discussion of the location 
of where the enieventae si Pierances between the 1956 and 
the 1977 students occurred when the examinee ability 
continuum was roughly divided up into three parts: low (86 < 
Si )ememediumsi=)< Oe wt imancdehntghecG >a je 

RDC S ai, me Ue meal oe mo wen men Or (en osln 2 hee ire vias 
DOME ano oe rniGUheSme /mare OOO ERONie§ Oo sav SOrns 4 6o ue aes 
42, 43, 45, 46, 48, and 49, respectively) had estimated item 
differences of less than 0.05. These was no evidence of item 
bias with these items and students of a given ability level 
in 1956 or 1977 would appear to have an equal probability of 


answering these items correctly. 


7 Accepting the premise that Raven’s Progressive Matrices 
test does, in fact, measure intelligence. 
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Table 27 
Location of Item Bias by Ability Level for 


Gate’s Paragraph Reading Test 


Ability Level 
1956 13.17 


item # low med. high low med. high 
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* Indicates those areas in the ability continuum where the 
students from a given study year had a higher probability 
of getting a given item correct than did the students of 
a similar ability level from the opposing study year. 
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There was only one item that had an estimated item 
difference of greater than 0.15. This was item number 8 
(Figure 28). While the 1977 students appeared to have a 
higher probability of answering this item correctly than did 
the 1956 students of similar ability levels, there appeared 
to be some confusion over this item by students of both 
study years. This is evidenced by the relatively high value 
of the lower asymptote and the shallow slope of the item 
characteristic curves. Item number 8 asks, "Which tree does 
not lose its leaves, " and shows a maple tree, an elm tree, 
and a pine tree. The confusion appears to result from 
whether or not the "needles" on the pine tree constitute 
"leaves". While this makes the item a poor one, it is still 
interesting to note that the 1977 students had a higher 
probability of answering this item correctly. 

Items 19 and 23 (Figures 36 and 40, respectively) had 
estimated item difference greater than 0.10 along with the 
previously mentioned item number 8. For both of these items, 
the medium-ability 1956 student had a higher probability of 
answering this item correctly. There are no obvious reasons 
why these items would be biased in favor of the 1956 
students. Although, one could speculate that the fashions in 
clothes in 1956 verses 1977 and the social studies 
curriculum changes over 21 years, both contribute to the 
observed student response pattern. 

In addition to those items previously reported, items 
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respectively) had estimated item differences greater than 
0.05 (see Table 11). The 1956 students appear to have a 
higher probability of answering all these items correctly 
except for item 22 where the low ability 1977 students had a 
higher probability of answering this item correctly than did 
the low-ability students in 1956. The results obtained from 
the 1977 students on item 22 may be related to the the fact 
that more 1977 students answered the item correctly than did 
the 1956 students and, consequently, the model fit may be 
invalid due to lack of data for the extreme values of 
abvlitycor es thatethesinstructionmeDrawrae limes around. was 
more confusing to the low-ability 1956 students than the 
low-ability 1977 students. The bias of the remaining items 
may be attributed to the use of term "modern" in the test 
directions and to the fact that the pictures incorporated in 
the test item would not appear "modern" to the 1977 
students. Another cause of the apparent bias can be seen in 
the relationship between the 1956 students’ environment and 
the environment of the 1977 students. 

Table 12 indicated that when the mean achievement 
difference was calculated using all the test items, the 1977 
students performed better than the 1956 students by 0.154 
ability units. When the data from the aberrant items were 
removed prior to the computation of the mean achievement 
difference, the following pattern appears. 

Recall that the 1977 students had a higher probability 


of getting item 8 correct over all ability levels than did 
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the 1956 students of the same ability level. When the data 
from this item were removed and the mean achievement 
difference was recalculated, the 1977 students performed 
better than the 1956 students by only 0.139 ability units. 
The drop in the mean achievement difference would 
substantiate that this item was, in fact, favoring the 1977 
students and, consequently, the data from this item should 
be removed before making any comparisons between the study 
groups. 

There were, however, other items that appeared to favor 
the 1956 students, and when the data from these items (19 
and 23) were removed along with the data from item 8, the 
mean achievement difference rose to an ability difference of 
.182 in favor of the 1977 students. The data for those items 
having an estimated item difference of 0.05 or greater 
produced a mean achievement difference of 0.189 in favor of 
the 1977 students. 

Table 28 presents a summary of the 1977 item parameters 
in relation to the 1956 item parameters for each of the 
aberrant items. It is interesting to note that all the 
aberrant items had better discrimination properities in 1956 
than in 1977. This could be a result of the test being out 
of date and the 1977 students found these items more 
confusing than did the 1956 students. Two of the aberrant 
items (8 & 22) were found to be easier for the 1977 
students, while the remaining aberrant items favored the 
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Table 28 
The Relationship Between the 1977 Aberrant Item Parameters 
and the 1956 Aberrant Item Parameters for 


Gates Paragraph Reading 
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These results would tend to indicate that the 1977 
students performed better than would be initially considered 
when assessing the mean achievement diffference that 
incorporated all test items. Those items that appeared to 
favor the 1956 students included items dealing with "modern" 
fans, the act of sewing, a train schedule, and a "modern" 
Use or flax \ollebased paints). As the 1977 student 
interacts with air conditioning, ready-made clothes, planes, 
and latex or acrylic based paints as opposed to fans, 
sewing, trains, and oil base paints. It is understandable 
that the 1977 students may not have the same probability of 
answering those items correctly than did the 1956 students. 

If the age difference previously reported is again 
taken into consideration, it would appear that the average 
1977 student has more ability to read and understand a 
statement directing him/her to perform a particular task 
than did the average 1956 student. 

Gates Word Recognition Test 

Ptemsi; 2,935 4, °55,.77 8,7 9seand 4/-were identified 
from Table 14 as having a difficulty (b) parameter that was 
less than -4.00 or greater than +4.00, as calculated from 
either the 1956 or 1977 data. The item characteristic curves 
for those items that had a difficulty (b) parameter of less 
than -4.00 would take the form of Figure 1. The item 
characteristic curves for those items that had a difficulty 
parameter of greater than +4.00 would take the form of 


Figure 148. These items were not found biasing either the 
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1956 students or the i977 students because these items were 
either so easy or so difficult, that the students from both 
study years either got them correct in the case of very easy 
items, or wrong, in the case of very difficult items. 

(ECMS MiG tema Wotan oer ake Oe Ome NS) Open eal 22. 823). 
Coleus Cores UPR, oem ander oo Ce 1OUReS > ea S27 panoo,. 2 Oo woo OO. 
Deo eo OmOeG. Comm Ooran OG OOM) /1 iGo Saehs anced? 
respectively) had estimated item differences of less than 
0.05. There was no evidence of item bias with these items; 
1.e€., examinees of a given ability level in either 1956 or 
1977 would appear to have an equal probability of answering 
these items correctly. 

The item characteristic curves (Figures 50 to 88) in 
conjunction with the results presented in Tables 17 and 29, 
form the basis of the following discussion of where the 
location of the achievement differences between the 1956 and 
the 1977 students occurred when the examinee ability 
continuum was roughly divided up into three parts: low (86 < 
othe Mvstelnvyny oak <ote 2 Gel SYetel (aydfelal ise. sei) 

Item 45 (Figure 86, chandelier) had an estimated item 
difference of greater than 0.15. The 1977 medium and high 
ability students had a higher probability of answering this 
item correctly than did the 1956 students of similar ability 
levels. The lower asymptote would indicate that guessing was 
high for the lower ability students of both study years. The 
word "chandelier" appears to be a word that was Known by 


both the medium and high ability 1977 students, but only the 
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Table 29 
Location of Item Bias by Ability Level for 


Gates Word Recognition Test 


Ability Level 
1956 ogee 


item # low med. high low med. high 
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Table 29 (con’ t) 


Ability Level 


1956 1977 
item # low med. high low med. high 
30 
Shy Er * 
oe 
eee * * 
34 * 
35 * * 
36 - 
Gore * * 
38 
39 * 
40 = 
41 Xk 
42 * * 
43 * 
44 * 
45 a3 z 
46 
47 
48 * * 


* Indicates those areas in the ability continuum where the 
students from a given study year had a higher probability 
of getting a given item correct than did the students of 
a similar ability level from the opposing study year. 
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very high ability 1956 students. 

bGemsoe Ont 24, 02s 40 cand 457 \Figdresaciy 65, #70967. 
and 88, respectively) had an estimated item difference 
greater than 0.10 (along with the previously mentioned item 
45. The results for item 29 (shark) indicated that the 1977 
low and medium ability students had a higher probability of 
getting this item correct than did the 1956 students of 
similar ability. The recent exposure that the 1977 students 
had to films depicting sharks (e.g., "“Jaws") could be an 
example of how material external to the school environment 
can influence a student’s learning. 

Items 20 and 40 (Figures 61 and 81) indicate that the 
1977 medium ability students had a higher probability of 
understanding the meaning of admiral than did the 1956 
medium ability students. A greater number of students from 
both study years correctly answered the anchor (item 20) 
question than the admiral question (item 40). This may be 
due to the unfamiliar way in which the admiral was drawn, 
but the item did not favor the 1956 group of students, but 
rather the 1977 group of students. Item 24 (Figure 65, veil) 
favored the 1956 medium ability students. Veils were more 
part of the fashions in the fifties than the seventies, so 
it is reasonable that the 1956 students would have a higher 
probability of getting this item correct. Item 48 (Figure 
88, rhythmic) shows the 1956 medium ability students and the 
1977 high ability students as having a higher probability of 


getting this item correct than did the students of similar 
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ability? inthe ‘opposinds group of students. s0verallg= this 
item was found to be difficult for both groups of students, 
but the 1977 students in general would appear to have a 
slightly higher probability of getting this item correct. 

LteOMS. O, mite 20 eee Ono Gos AP GOerncos Odio on sae 
(a senor a4. Faguresmou:, 56.6665, 167%. P27iri 4s. 75) WG. © fae 
75,) 00m) S2, 600,104, and S> respectively), im addition to 
those items previously reported, had estimated item 
differences of greater than 0.05 (see Table 17). The item 
characteristic curve plots show a variety of differences 
between the different ability groups, between the two study 
groups. 

[tems +6) if, cand@si aw Figures250,%58. and 725 vililage, 
grocer, and fierce) indicate that the 1956 low and medium 
ability students had a higher probability of getting these 
items correct than the 1977 students of similar ability. The 
1977 student would probably refer to the store manager or 
the produce manager rather than the grocer. The use of the 
term village in relation to the picture associated with item 
6, has been superceded by the word ‘town’ and the picture of 
a large cat in item 31 was not considered as ‘fierce’ by the 
1977 medium ability students. Item number 26 (Figure 67, 
chirp) indicates that the 1977 low and medium ability 
students had a higher probability of success on this item. 
Dir iSsQimnicu lite fomsbechhlaberwhy this is so...Gouldtrtg be 
that Charles Schultz’s popular cartoon character, Woodstock, 


has influenced the response pattern obtained for this item? 
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For items 33 and 35 (Figures 74 and 76; military and 
musician, respectively) the 1977 medium ability and the 1956 
low ability students had a higher probability of getting 
this item correct. The differences are probably caused by 
slightly different discrimination differences between the 
two study groups as opposed to specific changes in the item 
Githiculty obtained. 

rtemsio45 36> 4 seandes4 tr iguress/5, @/7, 462, «and 855 
gypsy, wrestle, medal, and dormitory, respectively) indicate 
that the 1977 medium ability student had a higher 
probability of getting the item correct. 

Items 37 and 42 (Figures 78 and 83; dwelling and arbor, 
respectively) indicate that the 1956 high ability students, 
the 1977 low ability students (in the case of item 37) and 
the medium ability students (in the case of item 42) had a 
higher probability of getting these items correct than did 
the students of corresponding ability level from the other 
group. Items 25 and 39 (Figures 66 and 80; stumble and 
slumber) indicate that the 1956 medium ability students had 
a higher probability of answering these items correctly than 
did the 1977 medium ability students. Finally, item 43 
(Figure 84; garrison) indicates that the 1956 high ability 
students had a higher probability of answering this item 
correctly. 

Table 30 presents a summary of how the 1977 item 
parameters differ in relation to the 1956 parameters for 


each of the aberrant items. A chi square of 0.10 (df=1) 
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Table 30 
The Relationship Between the 1977 Aberrant Item Parameters 
and the 1956 Aberrant Item Parameters for 


Gates Word Recognition 
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suggests that there was no consistent item parameter 
difference pattern found between the two study groups. 

While there did not appear to be any systematic item 
differences found between the two groups of students, those 
items that were found to be aberrant did appear to favor the 
1977 students when the mean ability differences were 
calculated. For instance, Table 18 indicated that the mean 
achievement difference between the 1956 students and the 
1977 students was 0.391 ability units. However, the mean 
achievement difference between the two groups drops when the 
data from the aberrant items were removed: i.e., when the 
data from the item that had an estimated item difference 
that was greater than 0.15 were removed, then the mean 
achievement between the 1956 and the 1977 students was 0.339 
abilty units. This value drops further to 0.296 and then to 
0.239 ability units when the data from the items that had an 
estimated item difference of greater than 0.10 and 0.05, 
respectively, were removed. 

These results suggest that the 1977 students were 
better able to recognize the words that were administered 
via the Gates Word Recognition test than did the 1956 
students, even after the data from the aberrant items were 
removed and even though the the 1956 students were older, on 
the average, than the 1977 students. 

California Mental Maturity Jest 
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LoGe (abou) Some) atone 6O5 , OO, 208,991" O29 ahd Qewere 
identimacdy trom Vablerz0eas having-a difficulty (b) 
parameter that was less than -4.00 or greater than +4.00, as 
calculated from either the 1956 or the 1977 data. The item 
characteristic curves for those items that had a difficulty 
(b) parameter of less than -4.00 would take the form of 
Figure 1. The item characteristic curves for those items 
that had a difficulty (b) parameter of greater than +4.00 
would take the form of Figure 148. These items were 
generally so difficult or so easy that students from both 
study years either got them correct in the case of the easy 
items, or wrong in the case of the difficult items. 
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estimated item differences of less than 0.05, i.e., these 
jtems displayed no evidence of item bias between the two 
groups of students. In other words, examinees of a given 
ability level in either 1956 or in 1977 would appear to have 
an equal probability of answering these items correctly. 
The item characteristic curves (Figures 89 to 147), in 
conjunction with the results presented in Tables 23 and 31 
form the basis of the following discussion of where the 
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1977 students occured when the examinee ability continuum 
was roughly divided up into three parts: low (86 < -1), 
mearume =) <9 0) tia tetana: hi Ghmic: > oi) 

tems eq? = seed (oe Oe Ome Oe OU. -and=S6 “eb igures=eo- 
90, 91, 92, 94, 95, 96, 97, 142, and 146, respectively) had 
an estimated item difference of greater than 0.15. Items 1 
through 10 deal with the examinee’ s ability to identify the 
left or the right limbs of various given figures. The medium 
ability 1956 students had a higher probability of getting 
ivems 15.223, and 4 correct than did the 197/ students of 
Similar abwlity.. thts) is falso.true-or the 1956 high ability 
StuUceneSa On  IeEMS sy eos ean a) -Oge Oy sdlids 10. "Generally, 
these results would suggest that the 1956 students were 
better able to correctly identify the right or left limbs of 
the given figures than were the 1977 students; except for 
low and medium ability 1977 students for items 7, 8, 9, and 
10. The reason why there was an interaction effect 
identified between the 1956 and the 1977 students on items 
7, 8, 9, and 10 was because these items did not discriminate 
the good students from the poor students as well in 1977 as 
they did in 1956. Items 90 and 96 were two vocabulary 
questions (athletic and construction) that the 1977 students 
of all ability levels had a higher probability of answering 
correctly than did the 1956 students. 

Items 54, 69, and 89 (Figures 128, 137, and 141, 
respectively) had an estimated item difference of greater 


than 0.10, but less than 0.15. Item 54 was a numerical 
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Table 31 
Location of Item Bias by Ability Level for the 


California Mental Maturity Test 


Ability Level 


1956 1977 
item # low med. high low med. high 
| * * 
e * * 
3 * *K 
4 * * 
5 * 
6 
Hf * * 
8 > * * 
g > * * 
10 * * * 
11 
12 
ee: 
14 
bs 
16 
abe 
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* Indicates those areas in the ability continuum where the 
students from a given study year had a higher probability 
of getting a given item correct than did the students of 
a similar ability level from the opposing study year. 
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series question that did not discriminate as well in 1956 as 
it did in 1977, consequently the lower ability 1956 students 
performed better than did the lower ability 1977 students, 
but the higher ability 1977 students had a higher 
probability of correctly answering this item than the high 
ability 1956 students. Item 69 instructed the examinees to 
mark the alternative that contained the number of cookies 
that would occur if the total number of cookies were divided 
among three persons. While the item was not a good 
discriminator for either study year, the 1956 students had a 
higher probability of answering this item correctly than did 
the 1977 students. Item 89 was a vocabulary item (vehicle) 
on which the 1977 students had a higher probability of 
getting correct than did the 1956 students. 

EUGMSM ORR Cee OU Summ tO eGR ois) Oo, Um OAs andrc> 
MEngurese 937 am 0 OMe aee 2 Gee 1264 o7 Aol, boa al sO eco, 
144, and 145, respectively) had an estimated item difference 
of greater than 0.05, but less than 0.10. Item 5 was one of 
the right/left identification problems which the 1956 medium 
ability students had a higher probability of answering 
correctly. Item 22 was a spatial relation problem that the 
1977 high ability students had a higher probability of 
answering correctly than did the 1956 high ability students. 
Items 39, 41, and 46 dealt with item relationships. The 
discrimination parameters (a) for these items indicated that 
discrimination was not good and that guessing was high for 


both groups of students. However, the 1956 low and medium 
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ability students, for items 39 and 41, and the high ability 
1977 students, for item 46, had a higher probability of 
getting this item correct than did the students of similar 
ability levels from the other group of students. Items 53 
and 57 were numerical series questions which the medium and 
high ability 1977 students had a higher probability of 
answering correctly than did the 1956 students. Items 63, 
65, and 70 were computational problems where the medium or 
high ability 1956 students had a higher probability of 
getting these items correct than did the 1977 students. 
Items 94 and 95 were vocabulary questions (eclipse and 
studious, respectively) that the medium ability 1977 
students, for item 94, and the medium and high ability 1956 
students, for item 95, had a higher probability of answering 
correctly than did a student of a similar ability level of 
the other study year. 

Table 32 presents a summary of how the 1977 item 
parameters differed in relation to the 1956 parameters for 
each of the aberrant items. A chi square of 0.27 (df=1) 
suggests that there was no consistent item parameter 
difference pattern found between the two study groups. 

There did not appear to be any systematic pattern of 
item differences found between the two groups of students. 
Table 24 indicated that the mean achievement difference 
between the 1956 students and the 1977 students was 0.553 
ability units in favor of the 1977 students when all the 


test items were included in the computation of the mean 
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Table 32 
The Relationship Between the 1977 Aberrant Item Parameters 
and the 1956 Aberrant Item Parameters for the 


California Mental Maturity Test 
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abilities. When the data from the the items that had an 
estimated item difference of greater than 0.15 were removed, 
prior to analysis, then the mean achievement difference 
between the 1956 students and the 1977 students was 0.501 
ability units. This value changes’ to 0.505 and thenvto 0.472 
ability units when the data from the items that had an 
estimated item difference of greater than 0.10 and 0.05, 
respectively, were removed. 

These results suggest that, on the average, the 1977 
students were more intellectually mature, as measured by the 
California Mental Maturity Test, than the 1956 students by 
abouts Os > lability tats. te isuinteresting tounotestiate this 
difference was found following the removal of the aberrant 
items and in spite of the fact that the 1956 students were, 


on the average, older than the 1977 students. 


C. Classical Test Theory Implications versus Latent Trait 
Theory Implications 

The results from Raven's Progressive Matrices test were 
employed in the following discussion as an example of the 
differences that were found between the implications of the 
results obtained by using classical test theory methodology 
(Table 33)8 and latent trait theory methodology (Tables 3 
and 5). While the results of these two methodologies are not 


8 Difficulty (diff.) is the proportion of examinees who 
correctly answered a given test item and the discrimination 
coefficient (disc.) is the point biserial correlation 
between a given item response and the total test score on 
all the other items in the test. 
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Progressive Matrices Test 


Classical 
1956 

HEC C71 ta Sse. 
1 0.999 0.008 
2 0.999. 0.079 
3 0.998 0.036 
4 0.9394 0.040 
2 02986 O2077 
6 Ro 9es 2060 
- TEMES Cw UU eee 
8 PeeuS: e215 
g OH 2B509C0 2310 
10 0.806 0.306 
11 F447 nO x32 
12 0.262 0.296 
ts 0.993 0.090 
14 0.985 0.101 
Se 0.981 0.052 
16 OES ple ined «ets ere 
17 Hetil soos 
18 02/520 024611 
19 OS 4445910 243.1 
20 2667) Us 437 
2 1 0.463 0.426 
22 O2556.9> 8.365 
23 O44 joumileash4 
24 (1a SO2335 
25 0.889 0.065 
26 0.970 0.174 
P| Oeste het (sy 
28 0.826 0.437 
29 02625 “Oe476 
30 0.473 0.446 
3 1 02479 (073997 
a2 0.293 0.460 
ao 0.347 10.46% 
34 GUAT 4 ea cage 
a5 0.288 0.482 
36 OC 18o to 
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directly comparable, because they are reported in different 
metrics, the implications of these results are comparable. 
If a difference level of 0.05 between the item 
difficulties obtained from the 1956 data and the 1977 data 
is assumed to be representing a significant difference 
between the two groups of examinees, then the item 
difficulties calculated, via classical test theory 
methodology, from the data obtained from items 7, 12, 16, 
(ive meee iy Poe oe PAN el Be Oe 30 83 else ee oe) 4, Seba and 
36 (see Table 33) would be considered to be different for 
each examinee group (1956 or 1977). In comparison, items 8, 
iter ONY LOO meni 25m Z4 gs 0 TS oe SO roa son and 
36 were identified as being different, between the two study 
years, by the latent trait methodology employed in this 
study (see Table 5). 
There are four categories by which the classical test 
theory results and the latent trait results can be compared: 
(a) both methods identified a given item as behaving in 
a similar manner each time the item was administered to 
each group of examinees; 
(b) both methods identified a given item as behaving 
differently each time the item was administered; 
(c) latent trait theory methodology identified a given 
item as behaving differently each time that the test 
was administered, but classical test theory methodology 
identified the item as behaving in a similar manner 


each time the item was administered; and 
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(d) classical test theory methodology identified a 

given item as behaving differently each time that the 

test was administered, but latent trait theory 

methodology identified the item as behaving in a 

similar manner each time the item was administered. 

While category (a) is perhaps the least interesting, it 
is important to note that the only time that there was 
complete agreement between the two methods was when the 
items were found by both groups of examinees to be very 
easy. This does, however, make sense because nearly all of 
the examinees at all ability levels got these items correct. 
Consequently, there was little discrimination among the 
examinees for either analysis procedure to detect 
significant differences. It would also be reasonable to 
suspect agreement between the two methods if an item were 
extremely difficult and guessing were not influencing the 
data. If guessing were influencing the data, the latent 
trait theory methodology would be more sensitive to 
differences between the two groups (if the three parameter 
latent trait model were employed) than the classical test 
theory methodology would be and consequently, there would 
not be agreement between the two methodologies. 

Category (b) occurred with items 12, 16, 21, 23, 24, 
SOMmCmreC len So ohe Co eco seantiaco a Whihesithe twoumne thodsor, 
analysis did agree on the fact that these items behaved 
differently each time that they were administered, the 


conditions which led to the identification of these items 
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were different: i.e., the item difficulty index that was 

obtained by the classical test theory methodology represents 

the average difficulty of examinees across all levels of 

examinee ability, whereas the difference value obtained from 

the latent trait methodology employed in this study 

represents a weighting of: 

ae the probability of success of the examinees of each 
ability level; 

2s the frequency differences found among the various 
ability levels; and 

on the sensitivity to differences in discrimination values 
for a given item between the two study groups. 

In addition, it is possible to evaluate the probability of 

success for each examinee ability level through the item 

characteristic curves. So, the underlying approach of the 

two methods is quite different, even though they came to the 

same conclusions for these items. 

Ga FegorymicimexistSs FOoGml teMmsec wr diimerteyeands20 
(Figures 3, 6, 10, and 12, respectively). The pattern of 
differences is similar for each of these items. The 
classical test theory methodology is insensitive to the fact 
that a greater number of examinees in the medium ability 
range had a greater difference of probability of success 
between the two study groups (1956 and 1977) than did the 
examinees who had either high or low abilities. 
Consequently, classical test theory methodology would not 


identify these items as behaving differently on each of the 
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two different test administrations, while latent trait 
theory methodology would. 

tema, (5° Cs5. Jere e eco ean. 29 triguress 2508),811, 14, 
18, and 19, respectively) can be located in category (d). 
Once again, the pattern of differences of item difficulty 
between the two study years as they are measured by the two 
different methodologies is similar for these items: i.e., 
the classical test theory methodology is not sensitive to 
the fact that there are fewer examinees found at the extreme 
ability levels. Consequently, classical test theory 
methodology is more likely to be influenced by differences 
in the extreme ability levels than is latent trait theory 
methodology. 

The implications of the classical test theory 
methodology and the latent trait theory methodology, as they 
were applied to these data, would have likely been more 
similar (except that latent trait theory methodology also 
allows the researcher to assess the probability of success 
for each examinee ability level) if the Rudner (1977) method 
of determining item bias were employed: i.e, the difference 
between the item characteristic curves that was obtained by 
weighting the probablity of success of a given examinee 
ability level by the frequency of that ability, may have 
been a more sensitive indicator of item bias across 
different administrations of the same test, than would have 
been the other methods of analysis that are currently 


available. 
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D. Conclusions 
Latent trait theory successfully provides for a 

methodology which can assess the validity of comparing 

scores that were obtained from a test-retest design; whether 
or not the retest was administered to the same population or 
sample as was the original test,% in order to assess 
achievement differences. 

The methodology advocated by this thesis, in order to 
determine which items should be included in the calculation 
of a mean difference score, is as follows: 
te Calculate the item parameters for each item for each 

test administration. 

20 Determine if any of the items have difficulty 
parameters that are greater than +4.00 or less than 
-4.00. If there are, then these items should not be 
included in the parameter rescaling procedure that 
follows. 

oe Rescale the item parameters from the two test 
administrations so that they are on the same scale. 

4, Calculate the probable item difference score by 
multiplying the differences, between the two test 
results, of the probability of success for a given 
ability level by the proportion of total scores found 


at that ability level, then summing across all ability 


8 Although not demonstrated in this thesis, latent trait 
theory does not necessarily presuppose that the two tests 
were identical, but rather that they measure the same latent 
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levels (see equation 7). 

Determine a level of probable item differences that 
would constitute an aberrant item. 

Remove the data obtained from the aberrant item(s). 
Calculate the examinee abilities (‘from the combined 
test-retest data) using the revised data. 

Calculate the mean group ability from each test 
administration. 

Determine the achievement difference between the two 
tesivadministrations in vabriity units. 


This procedure provides the researcher with several 


benefits because it allows the researcher to be able to: 


ie 


visually assess how the difficulty (b) parameter and 
the discrimination impart: affect the different 
students across all levels of ability through the 
production of the item characteristic curves; 
determine whether or not an item is measuring the same 
under lying trait each time that it is administered; 
take into account the number of examinees at each 
ability level in determining whether or not an item is 
favoring a given test administration group. 


While there are several benefits obtained when using a 


latent trait anaysis, there are some drawbacks that must 


also be considered, e.g., 


1. 


The number of examinees that need to be included in the 
testing procedure needs to be fairly large in order to 


assure that there is a representative sample of 
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examinees located at each ability level. 

ves The algorithm used to compute the student abilities and 
the item parameters requires the use of a large 
main-frame computer or an array processor. Generally 
speaking, the casual user has limited access to such 
machines. 

Sy At the present time, the robustness of the latent trait 
models has not been fully investigated, hence it is not 
Known what happens when the underlying assumptions are 
violated. 

In addition, there are two other points of interest 
that should be considered: 

Lt The breaking away from classical test theory 
methodology should be approached cautiously, in that, 
while the latent trait theory has been shown to be 
sound, the methods by which it has been implemented are 
often untested in relation to the underlying 
assumptions of the original theory (see Lumsden, 1976). 

ZR The idea of reporting achievement scores in ability 
units may be confusing to persons outside of the test 
measurement area. This point may be considered minor by 
some practitioners, but when it is important to the 
average teacher; i.e., how many teachers understand 
what a z-score is, let alone understand the meaning of 
aist clon pibaiiiny, Via hpi 
At the present time, latent trait theory is being 


applied to large scale testing projects by researchers with 
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backgrounds in measurement and computer applications. While 
there is no immediate application of latent trait 
methodology for the average teacher, the benefits of latent 
trait methodology can readily be seen in standardized 
testingiprograms (6.0. Esl.S.; Lerd, 1977) and in item 
banking programs that are involved in the constitution of 
tailored tests (e.g. Urry, 1977). As latent trait 
methodology is further tested and its strengths and 
weaknesses become Known, then latent trait methodology will 
no doubt continue to become more widely implemented. 

With respect to the Edmonton Grade III Achievement 
Study data, the latent trait methodology proposed in this 
thesis suggests the following results: 
le In all of the four tests evaluated, there were items 

that were found to be favoring either the 1956 students 

or the 1977 students at given examinee ability levels. 

This would imply that these tests were, in fact, 

measuring different latent traits when they were 

administered to each study group. This makes intuitive 
sense since it would be naive to believe that there 
would have been no curriculum changes occurring over 
the last 21 years in Grade III. While latent trait 
methodology can identify those items that did not 
behave in the same manner each time that the test was 
administered, latent trait theory can not tell the 
researcher why the item behaved differently. The 


researcher must speculate the cause by investigating 
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the changes that have occurred with respect to the 

material presented to the examinee by the item in 

question: both internally, within respective school 
curricula, and externally, without the formal school 
systems. 

In all of the four tests evaluated, there was no 

consistent pattern of bias found among the identified 

aberrant items. Essentially, this implies that, on the 
average, neither study group was favored more than the 
other when the tests were administered. However, there 
were differences with regards to specific items. When 
the data from these items were removed, prior to 
analysis, then the following results were found: 

a. the 1977 students obtained a score of 0.46 ability 
units above the 1956 students on the Raven 
Progressive Matrices test; 

Behe the 1977 students obtained a score of 0.189 
ability units above the 1956 students on the Gates 
Paragraph Reading Test; 

e the 1977 students obtained a score that was 0.239 
ability units above the 1956 students on the Gates 
Word Recognition test; and 

ais the 1977 students obtained a score that was 0.472 
ability units above the 1956 students on the 
California Mental Maturity Test. 

Since the 1956 students were reported by Clarke et al. 


(1978) as being 2.53 months older, on the average, than 
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the 1977 students, then the differences found between 

the 1956 and 19877 students would likely increase if the 

tests had been conducted for the same age groups. 
These results suggest that with regard to these four tests, 
the 1977 students, on the average, performed better than the 
1956 students. As these groups could be construed as being 
populations, any difference between them could be considered 
significant, but are these differences meaningful? The 
answer rests with what the reader would wish to use these 
results for. 

It is interesting to note that the two "intelligence" 
tests (the Raven Progressive Matrices and the California 
Mental Maturity tests) were in close agreement as to what 
were the mental ability differences between the two study 
groups (0.46 and 0.472 ability units, respectively). If the 
premise is accepted that both tests measure "intelligence", 
then the closeness of the ability differences scores would 
tend to support the appropriateness of the latent trait 
methodology employed within this study. 

In summary, the methodology described in this thesis 
appears to be a viable method for determining item bias when 
different groups of examinees have been administered a given 
test at different times and this, in turn, allows the 
researcher to accurately assess achievement differences 


between the two study groups. 
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GATES ADVANCED PRIMARY READING TESTS 


85 


For Grade 2 (Second Half) and Grade 3 


Type 2. Paragraph Reading 


Write your name here 
When is your birthday? 
Rene OM ne fer peek ok 


2. Draw a line around the milk 
bottle. 


To the Examiner: 1. See that each child has a pencil. 2. Distribute 
papers. 3. Have children fill in blanks at the top of the page. 4. In- 
structions to children: ““We are going to see how well you-can read. Do 
you see the stories and pictures on the front page of your booklet? 
Everyone look at the first story—up here (illustrating with your own 
copy). What does it say to do? (Have child answer.) That’s right, 
put an X on the ball. Everyone find the ball and put a cross on it. 
Be sure you put it right on the ball. (Check to see that they all have 
marked it correctly.) Now look at the box right under that one. What 
does this story tell you to do? (Have child answer.) That’s right, 
draw a line around the milk bottle. Everyone find the milk bottle on 
your paper and draw a line around it. Be sure to put it all around the 
bottle exactly as the story asks you to. (Check to make sure it is done 
correctly.) Now look at the first box on the next side—up here (illus- 
trating with your own paper). What does that story say to do? (Have 
pupil answer.) That’s right, draw a line under the little book. Be sure 
you find the little book, and be sure you draw the line under it exactly 
as the story asks you to. (Check to make sure papers are marked cor- 
rectly.) Now look at the box under that one. What does this story ask 
you to do? (Have pupil answer.) That’s right, draw a line from the 
Pig to the tree. Do it on your paper. Be sure it goes from the pig to 


4, Draw a line from the pig to 
the tree. 


the tree exactly as the story asks you to. (Check to make sure it is done 
correctly.) Do not open your books until I tell you to. Now I am 
going to show you what we are to do next. On the inside of the book 
are some more pictures and stories. (Examiner holds up a copy of the 
test showing the inner pages.) You are to do No. 1 (Examiner points 
to it on his own copy), then go on and do No. 2, then do the next one, 
and the next one, etc. (Examiner points down first column, then sec- 
ond, etc., and also demonstrates order on all three pages.) As soon as 
you have finished one story, you must go right ahead and do the next 
one right below it. Now remember, first you are to read the story be- 
low the picture; then you are to take your pencil and do exactly what 
the story tells you to do. Do you understand? All right. Open your 
books and BEGIN. Go ahead.” 5. Inspect the work of each child; see 
that each works from top to bottom of columns and that each follows 
the pages in order. Urge the children individually to try the examples 
in order, but do not tell them the answers. Discourage dawdling over 
difficult problems; tell them to try the next. 6. The signal STOP is 
given at the end of 25 minutes. Collect papers immediately. 7. The 
score is the number of directions which are followed correctly. The 
mark made must be the one specified in “the story” to be correct. For 
further details with respect to this test see the Manual of Directions. 
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1. Put an X on the little white 5. Nuts grow on trees. Draw a 
kitten with black spots on his line from the squirrel to what he | 
back. must climb to get his food. 


6. People buy newspapers. They | 
2. The hen has just laid an egg. like to read the news. Draw a 
Draw a line from the hen to her line under the boy who has news- | 
egg. papers to sell. 


(SN 


7. Mother set the table. She for- 
3. People live in houses. Ani- got the napkin at this place. It 
mals live in barns. Put an X on belongs next to the fork. Mark an 
the place where the animals live. X where the napkin belongs. 


MAPLE ELM PINE 
4, Every morning this boy combs 8. In the fall the maple trees lose 
his hair. Draw a line from the their leaves. The pine is green all 
boy to what keeps his hair neat. winter. Draw line under name of 


tree which does not lose its leaves. 


| 


9. In this quiet village the church 
bells ring on Sunday morning. 
All the people go to church. Put 
an X on the place where the people 
go on Sunday morning. 


Sse <a> 


10. Some children are playing on 
the beach. They want to dig in 
the sand. Father is bringing them 
something to use. Draw a line 
from it to the children. 


13. In the West some wild horses 
still live on the plains. Once a 
year men ride out to catch them. 
Find a picture of a horse that has 
not been tamed. Put an X on him. 


14. Baby is playing in his pen. He 
has dropped his toy. Brother will 
get it for him. Look for the toy. 
Draw a line from the toy to the 
one who dropped it. 


has two wheels. 
Wagons and cars have four wheels. 
Engines often have six or eight. 
Draw a line under something that 
has two wheels. 


11. A_ bicycle 


12. In the early days of our coun- 
try people had to hunt in the woods 
for their food. They shot deer, 
rabbits, and even bears. Draw a 
line from the hunter’s gun to 
something he shot in the woods. 


15. We had a big Thanksgiving 
dinner. First came soup and then 
turkey with vegetables. Last came 
pie and cheese. Make an X on the 
picture of the first thing eaten. 


and let- 
Corn is 


tuce are green vegetables. 
a vegetable too, but it is yellow or 


white. Draw a line under a vege- 
table that is not green. 


17. Arthur Brown lives on a large estate. 
Once a year his gardens are open to the 
public. Then the entrance gates are opened 
wide and the people drive in. Draw a 
line under the high fence which surrounds 
this estate and mark an X on what is 
opened to the public once a year. 


18. Fans have long been in fashion. 
Thousands of years ago palm leaf fans 
were waved in Egypt. Beautiful ladies 
once carried feather fans to balls. To- 
day we have electric fans whose blades 
are rubber. Put an X on the feather 
fan. Draw a line around the modern fan. 


© GA © 


19. Our five senses — seeing, hearing, 
smelling, tasting, and touching — are rep- 
resented by the pictures above of an eye, 
an ear, a nose, a mouth, and a hand. 
Which picture represents the sense of 
touch? Mark an X on it. Which picture 
represents the sense of smell? Draw a 
line around it. 


20. Mary is making an apron. She is 
going to trim it with lace and ribbons. 
The lace will go around the edge of the 
apron and a bow of ribbon will go on each 
pocket. Draw a line from each bow to its 
place on the apron. Mark an X on what 
will go around the edge of the apron. 


EAST ——o 


21. This road has a dangerous curve in 
it. If people were warned of the curve 
there would be fewer accidents. Mark X 
where you would put a sign to warn cars 
traveling east. Mark O on the road to 
show where you would put a sign to warn 
cars traveling west. 


TRAIN | TRACK |_| REMARKS 
NO.18 ARRIVES ON TIME 
NO.30 | DEPARTS| ZAM. 


10 MIN.LATE 


22. Railway stations post arrivals and de- 
partures of trains on blackboards similar to 
the one above. If you wish to meet Train 
No. 42, it will arrive on time, coming in on 
Track No. 18. Draw a line around the 
number of the train that will be ten minutes 
late. Place an X on the track number of 
the train departing at 11:42 A.M. 


23. From the stalks of the blue-flowered 
flax plant the ancient Egyptians wove linen 
to wrap their dead. Linen is still made 
from flax, but of greater importance is the 
oil from its seeds, used in making paint. 
Put an X on the part of the plant used for 
linen. Draw a line around the picture 
showing the use of a modern product of 
flax seeds. 


MERCURY 
DEGREES 
ZERO 
EXPANDS 
CONTRACTS 


24. A thermometer measures temperature. 
Mercury, enclosed in a glass tube, rises when 
heat increases and contracts when heat de- 
creases. We read the temperature in de- 
grees above and below zero. Put an X on 
the word that tells how mercury acts when 
it grows colder. 
word that tells in what form temperature 
is read. 


Draw a line under the — 
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GATES ADVANCED PRIMARY READING TESTS 
For Grade 2 (Second Half) and Grade 3 


Type 1. Word Recognition 


Buritesyour mame here... . Gk cs ne ee as 
When is your birthday? ..... ec eee ea ans 
| EDID Ej cee Grade as 


Om er cet Te, fel ser a) er ise te eo e eye. ‘ei, 0! (eure e ie: 0) 46:5 0) 6.48, 0, 6) e. -6 


To the Examiner: 1. See that each child has a pencil. 2. Dis- 
tribute papers. 3. Have children fill in blanks at the top of the 
page (with your help). 4. Instructions to children: “J want you 
to look at the first picture, this one up here (holding up your 
copy and pointing to the picture of the dog). Next to it there 
are some words. One of the words goes with the picture. You 
are to draw a ring around that one word that tells about the pic- 
ture. Put your finger on the word that belongs with the picture. 
What is it? (Let one child answer.) That’s right, ‘dog.’ The 
four words are ‘did,’ ‘egg,’ ‘dog,’ and ‘two’ (pointing to the 
words on your own copy and making sure children look up at 
your copy). We are going to draw a ring around the word dog 
because that’s the one that tells the most about the picture. 
Everyone find the word ‘dog’ on your paper and draw a ring 
around it. (Check to make sure children have marked the correct 
word.) Now look at the box right underneath that one. Find the 
word there that goes with the picture. What is it? (Let a child 
answer.) That’s right, ‘bed.’ The four words are ‘be,’ ‘bed, bag, 
‘she.’ We are going to draw a ring around the word ‘bed’ because 
that’s the one that tells us the most about the picture. Everyone 
find the word ‘bed’ and draw a ring around it. (Check to make 
sure that each child has marked the correct word. Continue in 
the same way for the third and fourth boxes. When you are illus- 
trating with your copy ask the children to look up if need be.) 


Do not open your books until I tell you to. Now I am 
going to show what we are to do next. Inside the book are 
some more pictures and words. (Examiner holds up copy of the 
test showing the inner pages.) You are to do the first one, then 
the next one below it, etc. (Examiner points down first column, 
then second, etc., and also demonstrates order on all three pages.) 
As soon as you have drawn a ring around the one word for 
one picture, go right ahead and do the next one. Now remember, 
first you are to look at the picture, then at the words next to the 
picture, then find the one word that goes best with the picture 
and make a ring around that one word. Make a ring around one 
word only for each picture. Do you understand? All right. Open 
your books and BEGIN. Go ahead.” 5. Inspect the work of each 
child; see that each works from top to bottom of columns and 
that each follows the pages in order. Urge children individually 
to try the examples in order but do not tell them the answers. 
Discourage dawdling over difficult problems; tell them to try the 
next. Watch for children who make rings indiscriminately and 
tell them to make only one ring for each picture. 6. The signal 
STOP is given at the end of 15 minutes. Collect papers imme- 
diately. 7. The score is the number of exercises marked correctly 
minus one-third the number incorrect. If more than one word 
in an exercise is marked, that exercise is scored as incorrect. For 
further details see the Manual of Directions. 


—————— 0 Savas oyT_aee wwe eee ees 
BUREAU OF PUBLICATIONS, TEACHERS COLLEGE 
COLUMBIA UNIVERSITY, NEW YORK 
Copyright, 1942, by Arthur I. Gates 
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paper apple 
land been 
floor gold 
part draw 
only before 
bread great 
« lost food 
man fire 
green horse 
woman world 
valley village 
snow settle 
doctor finger 
dinner ocean 
plate stamp 
swim stair 


orange 


occupy 


lumber 


lover 


grind 


string 


knee 


hang 


notice 


stream 


insect 


inquire 


button 


throat 


stroke 


ring 


knew 


forget 


forest 


forehead 


orchard 


merit 


author 


gross greedy 


prisoner grocer 


jacket hatch 


hatchet hateful 


distant meadow 


frost merchant 


harbor annual 


anchor appetite 


onion onward 


opinion ourself 


slender slipper 


closet | supper 


arrange arrow 


owner hero 


veil vigor 


rail 


stubborn trample 


stumble 


chirp 


chill 


raisin 


shamble 


sheer 


strike 


study 


thread 


needle 


stately 


chip 


sharp 


realize 


labor 


mast 


mask 


shark 


hark 


paint 


farm 


knife 


Success 


military 


weave 


mystery mirror 


gymnasium grassy 


gurgle gypsy 


wriggle nestle 


wrestle wringer 


dragon dying 


feeble dwelling 


advertise admiral 


affirm 


moral 


medical medal 


model meddle 


arbor argue 


harbor apron 


glimmer garrison 


comparison garter 


doughnut drawbridge 


dormitory donation 


chocolate chandelier 


ope a3 chimpanzee chiffonier 


equality epaulet 


pursuer 


pugilist 


rhetorical rickety 


equestrian questioner | 
i 


| 
; 
| 
| 
} 
} 
\ 


California Short-Form 
Test of Mental Maturity 


og 


GENERAL INSTRUCTIONS TO THE EXAMINER 


This test is primarily analytical and diagnostic 
but it also yields standardized test data including 
the customary M.A.'s and I.Q.'s. 


TIME LIMITS 


This is a power rather than a speed test.* How- 
ever, the time limits should be observed. They 
are ample for pupils to reach the practical limits 
of their abilities, and the test should be com- 
pleted in one testing period. 

Because of the wide differences in ability re- 
presented among pupils of any typical grade 
group, and between pupils of the first and third 
gtades, the time limits for this test are somewhat 
more flexible than those for the middle and upper 
grades; only upper time limits are given. For 
this reason the examiner should watch the group 
being examined and start the next item or sub- 
test if classes of advanced or bright pupils com- 


plete the work before the specified time elarses 
Time should not be counted, of course, until pupils 
actually begin work on an item or test. 


CAUTION AGAINST COACHING 


It is important that pupils understand clearly 
the manner in which they are expected to in- 
dicate their responses. However, the examiner 
should remember that he is giving a test, and 
not directing a learning activity; therefore, the 
correct response should in no way be indicated 
for any item except in the practice exercises. 


IMMATURE PUPILS 


When given to slow or immature pupils, this 
test may be administered in small groups of 6 
to 15. The examiner may also fill in the iden- 
tifying data on the back cover-page before dis- 
tributing the test booklets. 


DIRECTIONS FOR ADMINISTRATION 


Suggested time allotment: 


California Short-Form 
Test of Mental Maturity 
(1953 S-Form) 


*aterials required — 


about 42 minutes 
(total testing time) 


For each pupil: 

1 test booklet — California Short-Form Test 
of Mental Maturity 

] ordinary lead pencil with eraser attached, 
or a crayola 

l eraser (if not attached to pencil or if 
crayolas are used) 

] sheet of paper to be used as a marker 


In addition, for the examiner: 
extra pencils or crayolas 
extra erasers 
extra copy of test booklet — 
for demonstration purposes, if necessary 
stop watch, or watch or wall clock with sec- 
ond hand. 


After checking to see that all pupils have 
pencils or ctayolas, erasers, and markers, dis- 
tribute the test booklets, face-up. 

From this point on, certain parts of these direc- 


* Burt), Herold E.. Promciplon of Employment Paychelogy, Harper, 1942. 
p. 138. 


tions are printed in this different type face. These 
parts are to be read to the pupils. 


SAY: Look at the bottom of the little book you 
have just been given. It says: To Boys and 
Girls: This test booklet has some games you 
will like. They will show how well you can 
think. Do as many of them as you can. Do not 
turn this page until told to do so. 

If the identifying data on the back cover have 
been filled in before the test booklets were dis- 
tributed, omit the materials between the horizontal 
lines. 


SAY: Now turn the test booklet over. Notice in 
the light space in the upper right-hand corner 
that there are lines for your name, grade, age, 
ond so on. Write this information on these 
three lines. 

Note the space set off by parentheses in the 
middle of the third line for identifying data. This 
space is provided for teachers or examiners who 
wish pupils to indicate their section, class, home 
room, etc., in order to facilitate the handling of 
data. and test booklets after tests have been 
scored. 

Give pupils time to record these data. Check 
to see that information is properly entered 


SAY: When you have finished, turn your game book 
back to the front page ond wait until | tell 
you what to do. 


When al] pupils have jinished, 


SAY: Now open your game book to Test 1 and fold 
it back so that only the test shows. 


Demonstrate and be sure that all pupils under- 
stand. 


TEST 1 
Time required, about 3 minutes. 


SAY: Place your marker so you can see only the 
boy and girl at the top of the page. 


In having pupils mark the two sample items, 
do not correct their errors or explain what is 
meant by right or left. It is necessary only that 
they mark one of the boy’s hands and one of the 
girl's feet so they will understand how to mark. 
SAY: When you mark your answers put an X on 

whatever you are told. 

The examiner will draw a circle on the black- 
board and 
SAY: If I should tell you to put a mark on ao circle 

you would do it this way. 

The examiner will make an X on (in) the 
circle. 

SAY: A. Put a mark on the boy’s right hand. 
(Pause.) 
B. Now put a mark on the girl’s left foot. 

Take time to be sure that pupils are making 
marks even if they are wrong. 

SAY: Now move your marker down so you can see 
the girl, the boy, and the ball-player. 
1. Put a mark on the girl’s left arm. (Allow 
5 seconds.) 
2. Put a mark on the boy’s right foot. 
(Allow 5 seconds.) 
3. Put ao mark on the ball-player’s left 
foot. (Allow 5 seconds.) 
Now move your marker down so you can see 
the boy scout, the girl, and the boy standing 
on his head. 
4. Puta mark on the boy scout’s right arm. 
(Allow 5 seconds.) 
5. Put a@ mark on the girl’s left arm. 
(Allow 5 seconds.) 
6. Look at the boy standing on his head. 
Put a mark on his left foot.( Allow 5 seconds.) 
Now put your marker aside. Look at the 
pictures of hands and feet in the double boxes 
at the bottom of the page. In the double 
boxes put a mark on each right hand or foot. 
(Allow 20 seconds.) 
When the group have finished the tenth item, 


SAY: Stop. Now turn the page and fold it back 
so that only Test 2 shows. 
Be sure that pupils understand and have 


arranged their booklets properly. 
TES bez 
Suggested time limit, 7 minutes 


SAY: Place your marker so you can see only the first 
row of drowings. Look at the first drawing 
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ond then the other drawings in the same row. 


C. The examiner should point to the drawings 
in Row C. 


SAY: The first drawing is among the other draw- 
ings, but it is turned around or turned over. 
Find it and put a mark on it. 


Be sure that pupils marked the oval. 


SAY: Now move your marker down so you can sce 
the next drawings. 


D. Do this one in the same way. Which is 
the right answer? The last one is correct. It 
is the first drawing turned upside down. Put 
o mark on it. 


The examiner should check to see that simples 
C and D are correctly marked. 


SAY: Now do all the others on this page in the same 
way. Find one which ts the same as the first 
and put a mark on it. Do both sides of the 
page. You may use your marker if you wish 
to. Ready, begin. 


After 5 minutes, 


SAY: Stop. Now turn the test booklet over so that 
only Test 3 shows. 


TEST <3 
Time required, about 7 minutes 


SAY: Place your marker so you can see only the 
first row of pictures. You are to find some- 
thing in each row that is like the first two 
pictures and put a mark on it. 

E. Put o mark on the coat in the first row. 
The pants, sweater, and coat are alike because 
they are all something to wear. That's why 
you mark on the coat. Now move your marker 
down so you can see the toy wagon, top, and 
other drawings. 

F. Which of these three pictures goes 
with a toy wagon anda top? (Allow pupils 
to answer.) Yes, ball. The toy wagon, top, 
and ball are alike because they are all toys. 
Put o mark on the ball. 


The examiner should check to see that samples 
E and F are correctly marked. 


SAY: Now do the others on this page in the same 
way. Use your marker if you wish to. Put o 
mark on the picture that goes with the first 
two pictures in each row. Ready, begin. 


After 2 minutes, 
SAY: Be sure to do both sides of the page. 
After 7 minutes, 


SAY: Stop Now turn the page over ond fold it back 
so that only Test 4 shows. 


TEST 4 
Time required, about 7 minutes 


SAY: Place your marker so you can see only the 
first row of drawings. This ts a@ game to see 
how well you can think. 

Listen to what | say and then put o mark on 
the picture that ts the correct answer. 


The examiner should read with a clear distinct 
tone so that all pupils can hear without effort. 


SAY: 


G. Look at the first two boys. Bill caught 
more fish than Ned. Put a mark on Bill. 
(Pause.) He is the boy with the most fish. 
Move your marker down so you can see two 
girls with flowers. 

H. Alice found more flowers than Jane. 
Put a mark on Jane. (Pause.) Move your 
marker down so you can see two more girls. 


Be sure that samples G and H are correctly 
marked. 


SAY: 


1. Mary and Jane’s mother said, “I will give 
o ring to the one that does not breok any 
dishes.” Jane broke a cup. Put a mark on 
Mary. (Allow 5 seconds.) Move your marker 
down so you can see three boys. 


2. The teacher said, “The boy that does 
the best work may be traffic officer next 
week.” Bob did the best work. Put a mark 
on Bob. (Allow 5 seconds.) Move your marker 
down so you can see some drawings of night 
ond day. 


3. The teacher said, “If the sun shines, it 
is day.” The sun was shining. Put a mark 
on the picture that shows this. (Allow 5 
seconds.) Move your marker down so you can 
see three flags. 


4. The class voted that the two children 
who made the most points should lead in the 
salute to the flag. Mary and Jim earned the 
most points. Put a mark on the picture that 
shows Mary and Jim. (Allow 5 seconds.) Put 
your marker aside so you can see three girls. 


5. Jane's hat is larger than Mary’s. Mary’‘s 
hat is larger than Alice’s. Put a mark on 
Alice’s hat. (Allow 5 seconds.) Move your 
marker to the top of the page so that you 
can see three boys running. 


6. See the three boys who are running a 
race. Jim runs faster than Charles but not 
as fast as Tom. Puta mark on Tom. (Allow 
7 seconds.) Move your marker down so you 
can see three boys jumping. 


7. Jack jumps higher than Harry. Bob 
jumps higher than Harry. Put a mark on 
Harry. (Allow 7 seconds.) Move your marker 
down so you can see the boys on a ladder. 


8. Three boys ore up a ladder. Ned is 
farther up than Bill. Jim is farther up than 
Ned. Puta mark on Ned. (Allow 7 seconds.) 
Move your marker down so you can see three 
men. 


9. Bill said, “The mon at the door is either 
o policeman or a mail carrier. But he ts not 
o policeman.” Put a mark on the picture 
that shows which mon is at the door. (Allow 
7 seconds.) Move your marker down so you 


can see the boys, girls, and a bus. 


10. The children either ride to school in 
the bus or they walk. They did not walk this 


Pew 
doy. Put a mark on the picture which shows 
how they got to school. (Allow 7 seconas.) 
Move your marker down so you can sce four 
boys. . 


ll. Jack is the first boy. He said, ‘My 
brothe. ts taller than | am, or he is shorter, 
or he is the same size., But my brother is not 
taller nor is he shorter.” Put a mark on 
Jack’s brother. (Allow 7 seconds.) Put your 
marker aside so you can see three houses. 


12. Jane’s house is nearer the street corner 
than Betty's. Betty’s house is nearer than 
Clara’s. Put a mark on Betty’s house. (Allow 
10 seconds.) 


When the group have finished the twelfth item, 


SAY: Stop. Now turn the test booklet over so that 
only Test 5 shows. 
TEST 5 


Suggested time limit, 6 minutes 


SAY: 


Be 
SAY: 


Place your marker so you can see only the 
first row of drawings. 


1. Put a mark on the thing than can go 
the fastest. Did you put a mark on the air- 
plane? That is the right answer. Now move 
your marker down so you can see the football 
and other drawings. 


1. Puta mark on the thing that is lightest. 
(Allow 5 seconds.) Move your marker down 
so you can see the three animals. 


2. Put a mark on the thing that can pull 
the heaviest load. (Allow 5 seconds.) Move 
your marker down so you can see the leaves 
and other drawings. 


3. Put a mark on the box that has the 
most things in it. (Allow 10 seconds.) Move 
your marker down so you can see the pies. 


4. Put a mark on the plate that has the 
most pie on it. (Allow 5 seconds.) Move 
your marker down so you can see the clocks. 


5. Put a mark on the clock that shows 
the latest time. (Allow 5 seconds.) Put 
your marker aside so you can see the stamps. 


6. Puta mark on the group of stamps that 
costs the most. (Allow 5 seconds.) 

Now move your marker up to the top of 
the page so you can see some drawings of 
boxes with the little circles or marbles in them. 

J. In this first row the marbles in the 
boxes count up by ones. One of the boxes 
is wrong. It is the next to the last box. It 
should have four marbles. Put a mark on 
the box that ts wrong. 


sure that pupils mark the fourth box. 


One box ts wrong in each row. Put a mark 
on the box that is wrong. Do the rest of the 
rows on this page in the same way. 


After 3 minutes, 


SAY: 


Stop. Now turn the page over and fold it 
back so only Test 6 shows. 


iG 
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TESTCG 
Time required, about 8 minutes 


SAY: Place your marker so you can see only the 
first row of boxes with sticks in them. 

K. Look at the first box. It has three 
sticks. Put a mark on the box that has one 
more stick. — 

Check to see that pupils mark the Jast box in 
the first row. 


SAY: Move your marker down one row so you can 
see the blocks. 

L. Look ot the first box in this row. It 
has three blocks. If you take one of the 
blocks away, which box will it look like? Put 
a mark on the box with two blocks in it. 


Check to see that pupils mark the first box 
after the dotted line. 


SAY: Now move your marker down so you can see 
the cherries. 


1. Look at the cherries in the first box. 
If they were all together put a mark on the 
box which shows how they would look. (Allow 
10 seconds.) Move your marker down so you 
can see the chickens. 

2. Look at the chickens in the first box. 
They are in two pens. If they were all in one 
pen which box would show how they look? 
Put a mark on it. (Allow 15 seconds.) Move 
your marker down so you can see the birds. 

3. Look at the birds in the first picture. 
Put a mark on another picture that has the 
some number of birds. (Allow 15 seccnds.) 
Move your marker down so you can see the 
dishes. . 

4. Look at the dishes in the first box. 
Some of them are broken. Put a mark on the 
box that shows how many dishes there are 
that are not broken. (Allow 15 seconds.) 
Put your marker aside so you can see the 
shells. 

5. In the first box are the shells that Mary 
found. Jane found twice as many. Put a 
mark on the box that shows Jane’s shells. 
(Allow 15 seconds.) Now move your marker 
to the first row at the top of the page so you 
‘can see the baseballs. 

6. The baseballs in the first box belong 
to Bill. if he gives half of them to his 
brother, put a mark on the box which shows 
how many baseballs he gave to his brother. 
(Allow 20 seconds.) Move your marker down 
so you can see the goldfish bowls. 

7. Wf we take out one-fourth of the gold- 
fish in the first bowl, put ao mark on the bowl 
which shows how many goldfish that would 
be. (Allow 30 seconds.) Move your marker 
down so you can see the clothespins. 

8. If one-third of the clothespins on the 
first line fell off, put a mark on the line that 
shows how many clothespins would be left on 
the fine. (Allow 30 seconds.) Move your 
marker down so you can see the coins. 
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9. You can take away one of these four 
coins and have 16 cents left. Put a mark on 
the coin you can take away. (Allow 30 sec- 
onds.) Move your marker down so you can 
see the marbles. 


10. In the first box are some marbles. If 
two are given to Jack and two are given to 
Bill, put a@ mark on the box that will show 
the number of marbles thut ore left. (Allow 
about 30 seconds.) Move your marker down 
so you can see the cookies. 


Tl. In the first box are three sets of 
cookies. If they were divided equally among 
three children, put ao mark on the box which 
shows how many cookies each would hove. 
(Allow 30 seconds.) Put your marker aside 
so you can see the plants. 


12. In the first box are some plants. If 
they are all set out in three rows, put a mark 
on the box that shows the number of plants 
there will be in each row. (Allow 30 seconds.) 


When the group have finished the twelfth 
item, 
SAY: Stop. Now turn the test booklet over so that 
only Test 7 shows. 


TESia7 
Time required, about 6 minutes 


SAY: Place your marker so you can see only the 
first row. 

M. You are to mark the picture that | 
name. Bird. Put a mark on the bird. Move 
your marker down. 

N. Fish. Put a mark on the fish. 

The examiner should check to see that pupils 
have marked samples M and N correctly. 

The directicns to be given each time are: 
Move your marker down so you can see the pic- 
tures in number .............. (Pronounce the number.) 
Then pronounce the word and say: Put a mark 
Of ee . thus pronouncing the test word 
twice. Pause about 2 or 3 seconds for pupils to 
mark each item. 


TEST VOCABULARY 
Frog 
Plant 
Wigwam 
Bunch 
Reindeer 
The man inside 
Blossom 
Twig 
SAY: Move your marker up to the top of the page 
so you can see the flower, ship and bananas. 


9. Something delicious 
10. Shelter 

11. Something nibbling 
12. The cow between 

13. Tiniest person 

14. Insect 

15. Signal 

16. Refrigerator 

17. Those descending 

18.- Something comfortable 
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colt, and car. 
Vehicle 

Athlete 

One who defends 
Cultivating 
Distress 

Eclipse 

Studious person 
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26. Constructing 
27. Absurd picture 
28. Venerable person 


When pupils have had time to attempt the 
twenty-eighth i +m, 
SAY: Stop. Put your pencil down. 


Collect the scratch paper, test booklets, and 
any pencils that have been distributed. 
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Devised by 
ELIZABETH T. SULLIVAN, WILLIS W. CLARK, AND ERNEST W. TIEGS 


TO BOYS AND GIRLS: 


This test booklet has some games you will like. They will show how 


well you can think. Do as many of them as you can. 
DO NOT TURN THIS PAGE UNTIL TOLD TO DO SO. 


PUBLISHED BY CALIFORNIA TEST BUREAU — 5916 HOLLYWOOD BOULEVARD — LOS ANGELES 28, CALIFORNIA 
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